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METHOD FOR THE STABILIZATION OP PROTEINS AND THE 
THERMO STAB I LI ZED ALCOHOL DEHYDROGENASES PRODUCED THEREBY 



TECHNICAL FIELD OF THE INVENTION 

5 The present invention generally relates to a method 

for the directed evolution of proteins. In particular, 
the method is directed to stabilization of proteins such 
as dehydrogenases, and particularly is directed to a 
method for improving the thermostability of 
10 dehydrogenases such as alcohol dehydrogenases. The 
present invention also relates to thermostabilized 
alcohol dehydrogenases produced according to this method. 



BACKGROUND OF THE INVENTION 

15 Biocatalysts are enzymes which can specifically and 

efficiently expedite chemical reactions such as the 
synthesis of chemical compounds and biopolymers (Dixon et 
al., Enzymes (Academic Press, New York: 1979)). 
Biocatalysts are the key players in a number of important 

20 industrial synthetic and degradative applications 
including, but not limited to, the following: 

• Synthetic Applications - Biocatalysts currently are 
employed as feasible alternatives to traditional 
catalysts, especially for the synthesis of chiral 

25 intermediates, or in the reduction of the number of 

protection/deprotection steps. 

• Biodegradation Applications - Biocatalysts currently 
are employed as enzymatic degradation agents for 
environmental pollutants such as PCBs, chlorinated 

30 hydrocarbons, RDX, halogenated organic compounds, 

TNT, and other byproducts of industrial production 
that present significant health risks. 

• Diagnostics and Biosensors - Biocatalysts currently 
are employed as detection agents in diagnostic tests 

35 and as biosensors which require enzyme durability. 

• Other large-scale industrial applications - 
Biocatalysts currently are employed as catalysts in 
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the production of fuel supplies through conversion 
of agricultural feedstocks. 

One enzyme that is of considerable utility in 
current enzymatic processes is the dehydrogenase. In 
particular, alcohol dehydrogenases are enzymes that 
command formal, reversible, two-electron chemistry in 
which alcohols are oxidized to the corresponding ketones. 
Depending on the precise reaction conditions, ketones can 
be reduced to the respective alcohols via a 
stereospecific delivery of a hydride equivalent catalyzed 
by the enzyme coupled to a bound cof actor such as NADH or 
NADPH (Lemiere, "Alcohol Dehydrogenase Catalyzed 
Oxidoreduction Reactions in Organic Chemistry", In 
Enzymes as Catalysts in Organic Synthesis . Schneider et 
al., Eds. (1986) p. 17). This system thus provides a 
mild, extremely sensitive route to chiral compounds, 
without contamination from undesired, competing 
reactions. 

Such chiral compounds can be used, especially by the 
pharmaceutical industry, for the preparation of chiral 
therapeutics, and for effectively generating a wide 
variety of compounds having the capacity for industrial 
scale-up (Seebach et al., Org. Synth. . 63, i-_ (1984); 
25 Bradshaw et al . , J. Org. Chem. . 57, 1532(1992); Hummel, 
Biotechnol. Lett., 12, 403(1990)). In particular, 
dehydrogenases show promise for commercial application in 
the preparation of unusual amino acids and p- 
hydroxyketones, and in the resolution of racemic alcohols 
30 (Benoiton et al. , J. Am. Chem. Soc. . 79, 6192 (1957); 
Casy et al., Tetrahedron Lett. . 8 17 (1992); Jacovac 

et a1 -' J - Am - chem - s °c 104 , 4659-4665 (1982); Jones 
et al. Can. J. Chem. . 60, 19 (1982)). of the 
dehydrogenases, horse liver alcohol dehydrogenase (HLADH) 
is one of the most commonly used. 

For an enzyme biocatalyst such as HLADH to prove 
useful in a wide-scale, practical, industrial 
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application, it is important that the biocatalyst possess 
the ability to survive harsh, dynamic, environmental and 
handling conditions inherent to large-scale commercial 
processes. These conditions include nonref rigerated 

5 storage, and exposure to organic cosolvents and high 
reaction temperatures, as well as more idiosyncratic 
demands imposed by a particular industrial application. 

To date, one of the greatest challenges associated 
with biocatalyst implementation is that of overcoming an 

10 overall intrinsic instability that results in a 

requirement for special preparative approaches and 
handling conditions. Many methods have been used in an 
attempt to stabilize certain proteins. Rational protein 
engineering has allowed the redesign of proteins with 

15 altered properties such as enhanced stability, shifted pH 
optima, and different substrate specificities (see, e.g., 
Bryan et al., Proteins , 1, 326-334 (1986); Pantoliano et 
al., Biochemistry , 26, 2077-82 (1987); Carter et al., 
Science , 237 , 394-399 (1987); Wells et al . , "Designing 

20 substrate specificity by protein engineering of 

electrostatic interactions", , 84,1219-1223(1987); 

Grutter et al . , Nature , 277 , 667-669 (1979)). 

While potentially an extremely powerful tool, 
rational protein engineering can be extremely time- 

25 consuming and expensive, and currently can be employed 
only for a very small number of enzymes having well- 
defined crystal or solution structures. Moreover, since 
the approach is tailored to a specific enzyme, it 
typically cannot be generalized to other enzyme species. 

30 Other post -production stabilization methods such as 
immobilization (Macaskie et al., FEMS Microbiol Rev. , 
14,351-67 (1994); Shtelzer et al . , Biotechnol . Appl ■ 
Biochem. , 15, 227-35 (1992); Phadke, Biosystems , 27, 203- 
6 (1992)), or use of cross-linked enzymes (Navia et al . , 

35 "Crosslinked enzyme crystals as robust biocatalysts" , 
Proceedings of the Materials Research Society 1993 
Symposium, Biomolecular Materials by Design (1993)), 
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suffer some of the same as well as further shortcomings, 
and similarly, are often too expensive to implement. 

By contrast, directed evolution potentially can 
provide a practical approach to tailoring enzymes for a 
5 wide range of applications (Shao et al . , "Engineering New 
Functions and Altering Existing Functions", Current 
Opinion in Structural Biology , in press (1996)). In 
support of this, enzymes have been shown to be highly 
adaptable molecules over evolutionary time scales. Many 
10 enzymes catalyzing very different reactions appear to 

have come about by divergent evolution, acquiring diverse 
capabilities by the processes of random mutation, 
recombination, and natural selection. 

Thus, there remains a need for an effective means to 
15 randomly engineer better enzymes, particularly 

dehydrogenases, and especially, HLADH. The present 
invention seeks to overcome some of the aforesaid 
problems of enzyme design. In particular, it is an 
object of the present invention to provide a method for 
20 the directed evolution of enzymes, particularly 

dehydrogenases, and especially HLADH. It further is an 
object of the present invention to provide a method for 
stabilizing, e.g. improving the thermostability of 
enzymes such as dehydrogenases . Such a method of 
25 stabilizing dehydrogenases (particularly HLADH) would 
present a major advancement in the field since it would 
extend the shelf life, longevity, and active temperature 
range of these enzymes. These and other objects and 
advantages of the present invention, as well as further 
30 inventive features, will be apparent from the description 
of the invention provided herein. 

BRIEF SUMMARY OF THE INVENTION 

Briefly, the present invention provides, inter alia, 
35 a method for the stabilization of a protein (particularly 
for the stabilization of an alcohol dehydrogenase such as 
horse liver alcohol dehydrogenase (HLADH) , general 
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enrichment/selection means that can be employed in 
Escherichia and Thermus to select for cells having 
altered levels of alcohol dehydrogenase activity as 
compared to a wild- type cell, thermostabilized HLADH 
5 proteins and nucleic acid sequences encoding same, as 
well as plasmids and hosts cells comprising the nucleic 
acid sequences. 

BRIEF DESCRIPTION OF THE FIGURES 

10 Figure 1 is a diagram that generally depicts the 

approach of the present invention for the accelerated 
evolution of enzymes. A pool of mutants of the 
particular gene is obtained by means such as spontaneous, 
directed, chemical, or PCR-mediated mutagenesis. The 

15 mutants of interest (i.e., having the particular 

stabilized feature) are identified by means of a screen 
or selection (A), and optionally, compatible mutations 
can be combined (e.g., by gene splicing, in vitro 
recombination, and the like) to enhance the stability 

20 even further (B) . 

Figure 2 is a digitized image of results of a 
filter assay for alcohol dehydrogenase activity which 
demonstrates that wild- type HLADH is rapidly inactivated 
at 7 5°C: no heat treatment (A) ; 5 minutes of heat 

25 treatment at 75 °C (B) ; 10 minutes of heat treatment at 
75°C (C) ; 15 minutes of heat treatment at 75°C (D) ; 20 
minutes of heat treatment at 75°C (E) ; and 50 minutes of 
heat treatment at 75 °C (F) . 

Figure 3 is a partial restriction map of the 

30 plasmid pTG450 which contains the adh gene from plasmid 
pBPP cloned into a pTG100kan tr2 Thermus shuttle vector. 

Figure 4 is a bar chart that depicts the increased 
thermostability of HLADH mutants produced according to 
the invention at 70°C. Cells containing pGEM-T (i.e., 

35 having no HLADH gene) did not show any HLADH activity. 

Figure 5 is the sequence of adh gene [SEQ ID NO:l] 
that encodes the HLADH protein [SEQ ID NO: 2] , with the 
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location of certain mutations produced according to the 
invention identified as the boxed regions . 

DETAILED DESCRIPTION OP THE INVENTION 

The present invention provides, among other things, 
a method for stabilizing a certain feature of a protein 
(e.g., stability at a certain temperature, stability in 
the presence of certain reagents, etc.). In particular, 
the method of the invention provides a method for 
thermostabilizing a protein. Namely, the invention 
preferably provides a method of obtaining normative 
protein having a thermostability that is increased over 
that of the native version of said protein, as further 
described herein. 

According to the invention, a "native" protein is 
the protein as it generally is found in nature. By 
contrast, a "nonnative" protein differs from the native 
protein in that it has been modified by human 
intervention, i.e., at either the level of the protein 
20 or its encoding DNA (e.g., by recombinant means to 
directly alter the genome; by unique selection and 
forced mutation; by random mutagenesis) . Moreover, a 
"protein" desirably can be either an entire protein, or 
a portion of a protein (e.g., as where a chimeric 
nonnative protein results from either transcriptional or 
translational gene fusion) . Similarly, a "nonnative 
protein" in some applications (e.g., applications for 
further study) may be a peptide (i.e., an incomplete 
protein) , as where the peptide is chemically synthesized 
or, where a gene's coding sequence is transcribed or 
translated in vitro or, is produced by chemical 
processing of a complete protein. 

A preferred protein for stabilization, particularly 
thermostabilization according to the invention is a 
dehydrogenase, particularly an alcohol dehydrogenase, 
and especially horse liver alcohol dehydrogenase (e.g., 
as obtained from plasmid pBPP, and/or as set forth in 
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SEQ ID NO:2) . Notably, with respect to SEQ ID NO:2, this 
protein does not initiate with methionine (Met) . 
However, other varients of horse liver alcohol 
dehydrogenase produced by in vitro synthetic reactions , 

5 by means of chemical synthesis or, in other hosts (e.g., 
an eukaryotic host or other prokaryotic host cell) may 
possess a methionine residue in the first position of 
the protein. The numbering of residues in such proteins 
of course, would differ somewhat from that of SEQ ID 

10 NO:2. Namely, the second position of the aforementioned 
protein would be equivalent to the first position of the 
protein of SEQ ID NO: 2. Of course, the ordinarily 
skilled artisan would know how to compare equivalent 
regions of proteins. 

15 Desirably, other proteins (particularly proteins 

having capacity for industrial implementation) can be 
stabilized (e.g., thermostabilized) according to the 
invention. For instance, an alcohol dehydrogenase 
protein can be employed from another species. It is 

20 anticipated that this approach can be employed with 

alcohol dehydrogenases from other species based on the 
similarities between certain of the various alcohol 
dehydrogenases. Also, a protein according to the 
invention optionally can be another type of 

25 dehydrogenase, e.g., another type of NAD+ (P) -linked 
dehydrogenase including, but not limited to, malate 
dehydrogenase, lactate dehydrogenase, isocitrate 
dehydrogenase (NADP+) , hydroxy lacy 1 CoA dehydrogenase, 
glyceraldehyde 3 -phosphate dehydrogenase, and glucose 6- 

30 phosphate dehydrogenase (NADP+) . 

In a preferred embodiment, the method can be 
employed to thermostabilize a horse liver alcohol 
dehydrogenase. This method generally is depicted in 
Figure 1. Preferably the method comprises: 
35 (a) obtaining in a vector a gene that encodes the 

native protein; 
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8 

(b) mutating the vector at more than one position 
in the gene to produce a vector library of cells 
comprising mutated versions of the gene; 

(c) introducing the vector library en masse into 
cells of a strain in which the majority of the mutated 
versions of the gene are transcribed and translated to 
produce a cell library; 

(d) screening the cell library to identify a cell 
comprising a mutated version of the gene that encodes a 
normative protein having a thermostability that is 
increased over that of the wild-type verson of the 
protein; and 

(e) purifying the cell from the cell library. 
According to the invention, "gene that encodes said 

protein" can comprise a recombinant or nonrecombinant 
sequence, i.e., a sequence that is present as found in 
nature (i.e., encodes a native amino acid sequence) or, 
has been modified, for instance by the introduction of 
mutations (e.g., point mutations, insertions, deletions, 
or rearrangements) to comprise a normative amino acid 
sequence or, can be a mixture of native and nonnative 
amino acid sequences. Similarly, a recombinant gene may 
conjoin coding sequences (either in entirety or in part) 
with regulatory sequences (e.g., transcription 
initiation, transcription termination, translational 
start or stop sites, protein secretion sequences, and 
the like) which are not typically conjoined in nature. 
This can allow the production of a protein in a host in 
which it normally is not produced (e.g., production of. a 
eukaryotic protein in a prokaryotic cell) .. Preferably, 
however, the recombinant gene (which can derive, in 
entirety or part, from any prokaryotic, eukaryotic, 
bacteriophage, or viral source) is capable of being 
transcribed and translated in a prokaryotic cell, 
particularly, a cell comprising a member of the genuses 
Escherichi or Thermus. 
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Thus, preferably a host cell in the context of the 
present invention (i.e., which can be employed in a 
method of stabilizing proteins) is a member of the 
kingdom Bacteria, Archaea, or Eukarya. In particular, 
5 preferably a cell employed in the method of stabilizing 
(particularly thermostabilizing) proteins according to 
the invention is a thermophile or hyperthermophile . In 
particular, preferably a cell is a member of the genus 
Thermus, and desirably is of the species Thermus flaws, 

10 Thermus aquaticus, Thermus thermophilus , or Thermus sp. 
Optimally a cell is either an Escherichia coli cell or a 
Thermus aquaticus cell. 

The vector in which the gene of interest is 
subcloned can be any vector appropriate for delivery of 

15 a gene to a cell. For instance, the vector can be a 

plasmid, bacteriophage, virus, phagemid, cointegrate of 
one or more vector species, etc. Optimally, however, a 
vector is one that can be employed for gene expression 
in a prokaryotic cell such as a Thermus or Eshcerichia 

20 cell. It also is preferable that a vector have an 
ability to shuttle between different cells, e.g., 
between a Thermus and an Eschericia cell. One such 
vector that can be employed in the context of the 
invention is the vector pTG450. 

25 The preferred method of the invention calls for 

mutating a vector containing the gene encoding the 
protein to be stabilized. Any method of mutagenesis such 
as is known to those skilled in the art and particularly 
as is described in the following Examples, can be 

30 employed in the method of the invention for generating a 
mutated gene. Desirably a PCR-based (error prone) 
approach, especially as set out as follows, is employed 
for mutagenesis. However, other mutagens (e.g., chemical 
mutagens such as hydroxy 1 amine) , also can be employed. 

35 In the preferred method of mutagenesis employed in 

the invention, desirably the vector is mutated at more 
than one position in the gene of interest. This can be 
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assessed by means known in the art and as described in 
the Examples. Such mutagenesis in more than one position 
in the gene will result in a u vector library" comprising 
mutated versions of a gene, particularly of a horse 
5 liver alcohol dehydrogenase gene, which are present in 
the library mixture. 

The vector library can be introduced en masse into 
cells (e.g., by transformation). Since the vectors and 
the cells employed for these methods are selected to be 
10 compatible, and the gene is engineered (e.g., as 

described below) to contain or to be flanked by any 
sequences necessary for its expression, it is expected 
that such introduction will result in the transcription 
and ensuing translation of the introduced gene. 
15 Moreover, such en masse introduction will result in the 
generation of a cell library comprising a mixture of 
cells transformed with plasmids having differing mutated 
genes. In some instances, it may be desirable to 
reisolate the vectors from the cell library (e.g., by a 
20 plasmid isolation or other vector isolation protocol) , 
excise out the mutated gene, and subclone the mutated 
gene into another vector (e.g., a vector that has not 
been mutagenized) . 

Following the generation of the cell library, the 
25 cells preferably are screened under conditions that 
allow identification of a cell comprising a mutated 
version of the gene of interest that encodes a nonnative 
protein having a protein that is stabilized (e.g., 
thermostabilized) over that of the wild-type (i.e., 
30 native) versions of the protein. A variety of selection 
means can be employed in accordance with the method of 
the present invention and, in particular, the selection 
means identified in the Examples which follow can be 
employed. Of course, one of ordinary skill in the art 
35 could modify these methods such that they are adapted 
for a particular host cell and/or a particular protein 
of interest. Desirably, however, screening conditions 
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are employed that provide for enrichment and/or 
selection for a cell containing nonnative DNA that 
encodes a protein having a particular feature of 
interest . 

5 In particular, when the protein being stabilized 

according to the invention is an alcohol dehydrogenase, 
and particularly HLADH, the screen preferably can be 
carried out at increased temperature. For instance, 
desirably, screening is done at temperature a few 
10 degrees above and a few degrees below the temperature at 
which the native {i.e., wild-type) alcohol dehydrogenase 
is inactivated in the particular host cell employed for 
screening . 

According to this invention, "increasing the 

15 thermostability" of a nonnative protein means: (a) 
increasing the length of time at which a nonnative 
protein exhibits activity as compared to the wild-type 
protein; (b) increasing the temperature at which a 
nonnative protein exhibits activity as compared to a 

20 wild-type protein; or (c) increasing the length of time 
and temperature at which a nonnative protein exhibits 
activity as compared to a wild-type protein. A protein's 
activity can be determined by a variety of tests that 
differ with the various proteins to be tested. A few 

25 representative tests that can be employed in the method 
of the invention are set out in the following Examples. 
Preferably, however, "activity" means a detectable 
activity ranging from 10 to 90 units. For instance, 
whereas a wild- type protein might exhibit 10% activity 

30 at a defined temperature for a set amount of time, a 
thermostabilized enzyme might exhibit 10% activity at 
the same temperature for an increased amount of time, 
and/or might exhibit an activity at an increased 
temperature at which the native protein exhibits reduced 

35 or no activity. 
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The screening methods also desirably can be done, 
for instance, in the presence of alcohol, optionally at 
a lowered pH. 

Following screening of cells to identify those 
having the desired trait (s) imparted by the mutated 
gene, optionally, cells exhibiting the trait can be 
further isolated. Vectors containing mutated versions of 
the gene of interest optionally can be further 
mutagenized by repeating steps (b) through (e) above to 
further stabilize the encoded protein. 

The present invention accordingly also provides 
screens that can be employed to select for or against 
cells having altered ADH activity. For instance, the 
invention provides a method for selecting against growth 
of Eschericia coli recombinant cells which comprise 
levels of alcohol dehydrogenase that are higher than 
those of wild-type Eschericia coli cells. According to 
this invention, "growth" means an increase in cell mass, 
or some other evidence of cell metabolism such as one of 
ordinary skill in the art knows how to detect, or is 
described in the following Examples. An "absence of 
growth" means growth is not measurable by common 
procedures (e.g., visual or spectrophotometry 
observation and the like) or, cell killing. Cell killing 
can be determined by any well known means, e.g., visual 
observation, release of cell components, vital staining 
etc. 

Thus the E.coli selection method comprises growing 
said recombinant cells under conditions selected from 
the group consisting of, wherein ethanol is present in a 
concentration of about 10%, isopropanol is present in a 
concentration of about 4%, and propanol is present in a 
concentration of about 2%, with the proviso that the 
wild-type cells exhibit reduced or an absence of growth 
35 under these conditions. 

The present invention similarly provides a method 
for selecting for growth of Thermus flavus recombinant 
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cells which comprise levels of alcohol dehydrogenase 
that are higher than those of wild- type Thermus flavus 
cells. This method comprises growing the recombinant 
cells under conditions selected from the group 

5 consisting of wherein ethanol is present at a 

concentration of aboutl% in a liquid or solid medium at 
a pH of about 7.0, with the proviso that the wild- type 
cells exhibit reduced or an absence of growth under 
these conditions. 

10 As mentioned previously, these methods have been 

employed to thermostabilize HLADH. In particular, the 
invention provides an isolated and purified 
thermostabilized HLADH protein comprising a sequence 
selected from the group consisting of SEQ ID NO: 4, SEQ 

15 ID NO:6, SEQ ID NO:8 f SEQ ID NO:10, SEQ ID NO: 12, SEQ ID 
NO: 14, SEQ ID NO: 16, SEQ ID NO: 18 and SEQ ID NO: 20. The 
invention also provides genes encoding such protein, 
e.g., an isolated and purified nucleic acid comprising a 
sequence selected from the group consisting of SEQ ID 

20 NO:3; SEQ ID NO:5, SEQ ID NO : 7 , SEQ ID NO : 9 , SEQ ID 

N0:11, SEQ ID NO:13, SEQ ID NO:15, SEQ ID NO:l7 and SEQ 
ID NO:19. 

Moreover, the invention provides for plasmids 
encoding for such proteins: e.g., a plasmid comprising 
25 one of the aforementioned nucleic acid sequences; and a 
plasmid selected from the group consisting of pAD7; 
pAD8, pADIO, pAD91, pAD92, pAD93 , pAD95, pADlll, pAD113, 
and pTG450. 

The invention further preferably provides a method 
30 of increasing the thermostability of horse liver alcohol 
dehydrogenase. This method comprises introducing into a 
gene which encodes the alcohol dehydrogenase a mutation 
at a codon which codes for an amino acid residue at a 
position selected from the group consisting of the amino 
35 acid positions, 75, 94, 110, 177, 257, 268, 282, 292, 
and 297. 
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Examination of the three-dimensional structure of 
the HLADH protein will elucidate the manner in which 
further amino acid substitutions thermostabilizing the 
enzyme can be made, for instance, like-for-like (e.g., 
5 with acidic amino acids (i.e., aspartic acid, glutamic 
acid) being substituted for acidic amino acids; basic 
amino acids (i.e., lysine, arginine, histidine) being 
substituted for basic amino acids; sulfur containing 
amino acids (i.e., cysteine) being substituted for 
sulfur containing amino acids; amides (i . e . , asparagine, 
glutamine) being substituted for amides, aliphatic 
nonpolar amino acids (i.e., glycine, alanine, valine, 
leucine, isoleucine) being substituted for aliphatic 
nonpolar amino acids; and alcoholic, aliphatic, and 
15 aromatic amino acids (i.e., serine, threonine, 
thyrosine, phenylalanine, and tryptophan) being 
substituted for alcoholic, aliphatic, and aromatic amino 
acids. 



10 



20 



30 



Additional uses and benefits of the invention will 
be apparent to one of ordinary skill in the art. 



EXAMPLES 



The following examples further illustrate the 
25 present invention but, of course, should not be construed 
as in any way limiting its scope. 



EXAMPLE 1 : Quantitative assay for ADH 

in cell extracts. 
This example describes a method for the 
quantification of ADH in cell extracts, particularly for 
the quantitation of HLADH, that can be used according to 
the invention. 

For this assay, overnight cultures of cells to be 
35 assayed are grown in rich media. The cells are washed, 
resuspended in 600 fil of assay buffer (83 mM KH 2 P0 4 [pH 
7.3], 40 mM KC1 , 0.25 mM EDTA) , and sonicated. The assay 
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mixture contains 500 fil of cell extract, 100 /il EtOH, 20 
111 100 mM NAD, 830 /il buffer and is carried out at room 
temperature. The reaction is run for 3 minutes and 
absorbence at 34 0 nM is measured. Using this approach it 
5 is possible to identify a high IPTG inducible activity in 
the strains with the HLADH coding sequence under the 
control of the lacZ promoter. This method thus produces 
a reliable quantitative determination of HLADH activity 
present in the cell. 

10 

EXAMPLE 2 : p-Rosanaline/alcohol plate 

screen in E. coll . 
This example describes a plate screen for ADH 
activity that can be employed, for instance, in E. coll. 
15 p-Rosaniline indicator plates are prepared according 

to Conway et al . (Conway et al . , 169 , 2591-2597 (1987)) 
by adding 8 ml of p-rosaniline (2.5 mg/ml in 96% ethanol) 
and 100 mg of sodium bisulfite to 400 ml batches of 

precooled (4 5°C) Luria agar. Most of the dye is 

20 immediately converted to the leuco form by reaction with 
bisulfite to produce a rose-colored medium. Ethanol 
diffuses into the E. coll cells to produce the 
acetaldehyde by alcohol dehydrogenase. The leuco dye 
serves as a sink, reacting with the acetaldehyde to form 

25 a Schiff base which is intensely red. Thus, the plates 

can be streaked with a strain or, a strain can be applied 
in patches to the plate. Colonies will appear a deeper 
intensity of red dependent upon the level of ADH present 
in the cell. In particular, by plating appropriate 

30 controls on each plate, it is relatively easy to visually 
discern a strain which has a high level of dehydrogenase 
(deep red staining) , an intermediate level of 
dehydrogenase (more moderate red staining) , and no 
activity (little or no red staining) . 

35 This method thus provides a plate screen that can be 

employed in the method of the invention. 
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EXAMPLE Filter screen 'for HLADH activity 

This example describes a sensitive plate assav of 
ADH activity wh ich also all ows colonies ^ 
under different treatment conditions. 
5 This assay relies for manipulation of bacterial 

colon.es on the binding of the colonies to a 

ZTfT 1 ^ 1085 fllter - aSSay 13 — ied «* by a 

modxfied protocol described by ReHos et al. (Rel Ls et 

10 ( ^TTT T eSSi ° n flnd P -^ation- -0-277 

85- ' 50 ^ 3 S6rieS ° f Matures between 65 and 

10 m / lnCreme " ts -ith incubation times varying from 
10 mxnutes to one hour is analyzed in an attempt to 
determine the cutoff of the stability Qf ^ £j> 
protem. For these experiments, the source of the adh 
gene encoding t he HLADH enzyme was pl asmia pBPp . ^ 
8l " i-§i°l i _Chenu , .266, i 329 6-13302 (1991)). 

B. coll DHSa cells containing plasmid pBPP (i e 
HLADH, or plasmid pCRU (i.e. , HLADH") ( InVi trogen^ ' 
Carlsbad, CA, were grown on rich media plates at cell 

trZf 163 T ab ° Ut 1,000 COl ° nieS ^ -d 
transferred onto a nitrocellulose membrane. The adhered 

Tell TrT't " ""^ 1 U0 m *» - - 

DNAse) xn a chloroform bath for about one hour, washed 
once » suffer 2 (10 mM KMes, 0.5 mM CoCl 2 , 0 .» BSaT 
and then washed two more times in Buffer 3 (Buffer 2 ' 
without BSA) The fii^ , 

a; . The filters were then incubated at high 

temperatures in Buffer 4 (10 mM gl ycine , 0 .5 mM CoCl 2 ) 
and, after washing in Buffer 3, were incubated in the 
enzyme-detecting solution (30 mM Tris, pH 8 3 2 % 

ethanol ! mM NAD+, o.X mg/ml phenazine methosulfate, X 

mg/ml nitroblue tetra7niinmi 

ue cetrazolium) at room temperature for 3-5 
minutes. J D 

Results of these experiments are depicted in Figure 
2. As can be seen in this figure, the experiments 
confirm that a 15-20 minute treatment of the filters at 
750C resulted in roughly 90%. inactivation of the HLADH 



20 
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protein as estimated by the color changes. This 
information on the activity of the native protein can be 
used as a baseline for the identification and isolation 
of mutagenized candidates having altered ADH activity 
5 according to the invention. 

EXAMPLE 4 : Shuttle vectors and use of a p-rosaniline 

assay for verification of the activity 
of the HLADH gene in Thermus 
10 In order to allow expression of the HLADH gene in 

both Thermus and E. coli , the gene was subcloned into the 

Thermus shuttle vector, pTG100kan tr2 to create plasmid 
pTG450 depicted in Figure 3. In this construct, the gene 
is placed upstream of the thermostable kanamycin 

15 resistance gene (Jcan tr2 ) , which is commanded by the lac 
promoter in E. coli, and the leu promoter in Thermus. 

An E. coli strain harboring pTG450 has three times 
more HLADH activity in the presence of IPTG than the 
strain harboring the original pBPP plasmid. When 

20 transformed into Thermus, the adh gene integrates into 
the leuB site in the Thermus chromosome by a double 
recombination event. For these experiments, Thermus 
flavus was transformed with both the HLADH" plasmid 
pTG100kan tr2 (i.e., creating strain TGF353) and the HLADH* 

25 plasmid TG450 (i.e., creating strain TGF650) . 

The presence of the adh gene in TGF650 was confirmed 
by PCR, and both TGF353 and TGF650 cells were assayed 
using a variation of the p-rosaniline plate assay 
described in Example 2. Namely, the agar overlay 

30 contained the same ingredients described, except TT media 
(Weber et al . , Bio/Technology , 13 , 271-275 (1995); Oshima 
et al., International Journal of Systematic Bacteriology , 
24 , 102-112 (1974)) was employed instead of Luria broth. 
A standard p-rosaniline plate can not be used, since the 

35 indicator dye will spontaneously convert to the Schiff 
base if incubated overnight in the plate as part of this 
assay. 
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Using this approach, HLADH activity was observed in 
the pTG450 Thermus transformants at a level well above 
background levels observed for the pTG100kantr2 Thermus 
transformants. The activity was observed up to 70°C. 
5 These results thus confirm that a p-rosaniline plate 
assay similarly can be employed in the context of the 
present invention for screening in Thermus for mutants 
having altered ADH activity. 

10 EXAMPLE 5; Development of a Method of HLADH 

Selection/Enrichment in E. coli 
This example describes a method of negative 
selection for growth of E. coli strains harboring the 
adh gene . 

For these experiments, E. coli DH5 a cells 
containing either pTG100kan tr2 (i.e., HLADH - ) or pTG450 
(i.e., HLADH+) were grown on LB plates with different 
alcohols in concentrations ranging from 2% to 12%. The 
results of one such experiment are displayed in Table 1. 



15 
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As can be seen from Table 1, E, coli cells 
harboring high activity of HLADH (i.e., transformed with 
the HLADH* plasmid pTG450) are more sensitive to the 
5 presence of the alcohols in high concentrations. This 
probably is due to the accumulation of toxic aldehyde 
levels in the cells which result from the alcohol 
dehydrogenase reaction. Three other alcohols were 
tested (i.e., benzyl alcohol, hexyl alcohol, and hexyl 
10 amine) , but did not give clear results because of their 
poor solubility in the media. 

The experiment was repeated several times and the 
alcohol levels were refined to determine a range 
resulting in a clear selection. Three of the alcohols, 
15 i.e., ethanol at a concentration of 10%, isopropanol at 
a concentration of 4%, and propanol at a concentration 
of 2%, resulted in clean, negative selection for growth 
of E. coli harboring the adh gene. 

These results thus confirm that the selection 
20 scheme can be employed for the isolation of mutants with 
altered ADH activity and, in particular, to select 
against E. coli strains having high levels of ADH. Such 
a system of negative selection also can be employed to 
affirmatively identify mutants having high levels of 
25 ADH. For instance, cells can be replica plated onto a 
series of plates from a single master plate prior to 
their transfer to nitrocellulose membranes. One of the 
plates can be retained, instead of being transferred to 
nitrocellulose, and matched against the sensitive cells 
30 identified in the assay. Cells of interest can then be 
recovered from the untreated plates. 

EXAMPLE 6 : Development of a Method of HLADH 

Selection/Enrichment in Thermus 
35 This example describes the growth of Thermus 

strains in the presence of the high concentrations of 
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alcohols as a general method for selecting for growth of 
Thermus strains having high levels of ADH activity. 

A series of experiments was conducted to develop a 
selection using alcohol levels in Thermus, In these 

5 experiments, Thermus flavus strains TGF353 (HLADH" ) and 
TGF670 (HLADH+) were employed. Each strain was grown 
for two days on Thermus rich media (e.g., TT media, as 
described in Oshima et al . , International Journal of 
Systematic Bacteriology , 24, 102-112 (1974)) present in 

10 plates or, was grown overnight in 4 ml of liquid TT 
medium, in order to ensure the cells were at the same 
physiological stage prior to testing. The test itself 
was performed on TT media and Thermus minimal media (Yeh 
et al., J. Biol. Chem. , 251 , 3134-3139 (1976) containing 

15 Casaminoacids. (TMIN, CAA) . Over a series of many 

experiments, the strains were grown on agar plates or in 
liquid medium containing various concentrations of 
ethanol (i.e., 0.5, 1, 2, 4, 6, or 8%), various 
concentrations of methanol (i.e., 2, 4, 6, or 8%), 

20 various concentrations of isopropanol (i.e., 0.5, 1, 2, 
4, or 6%), various concentrations of propanol (i.e., 1, 
2, 4, or 6%), or various concentrations of propanediol 
(i.e. 0.5 or 1%). Such experiments further were done at 
different pHs, i.e., at pH 7.0, 7.5 and 8.0, for the 

25 various alcohols at different concentrations. The 

results of one of these experiments is set out in Table 
2. 
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As can be seen from this experiment, the HLADH + 
strain TGF670 demonstrates higher resistance to alcohols 
than the HLADH" strain TGF353. Moreover, this selection 
5 appears to be dependent on pH, with the selection 

functioning better at lower pH, especially with ethanol . 
The selection thus may work by lowering the pH of the 
media-Ther/nus prefers higher pH for growth, in the range 
of pH 7.5-8.5 although not enough Thermus 

10 biochemistry is known to make this conclusive. 

A similar effect can also be achieved on plates. 
However, the primary effect of the screen in Thermus is 
to retard growth of cells without the adh gene, not to 
completely eliminate it. This also is the case with the 

15 liquid media, indicating that a completely clean 

selection in Thermus without background is difficult to 
achieve. Nevertheless, this selection means provides a 
powerful enrichment, especially in liquid, by selecting 
for faster growing cells under the conditions defined. 

20 The results thus confirm that the 

enrichment/selection means outlined above can be 
employed with Thermus. 

EXAMPLE 7 : Hydroxylamine mutagenesis of the adh gene . 

25 This example describes mutagenesis of the adh gene 

as a representative alcohol dehydrogenase gene using the 
mutagen hydroxylamine (HA) . 

For HA mutagenesis of the adh gene, plasmids pBFP 
and pTG450, both of which contain this gene, were treated 

30 with HA using a standard approach. Namely, approximately 
8 /zg of plasmid DNA was mixed with 0.5 M NH 2 OH and 
incubated at 37°C for various lengths of time. For 
example, aliquots were taken at 1, 2, 3 , or 4 hours 
following treatment, or following overnight exposure to 

35 the mutagen. The plasmid DNA was then transformed into 

£. coli strain DH5a and plated onto LB Apl0O plates (i.e. LB 
plates containing 100 jag/ml ampicillin) . Transf ormants 
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were analyzed by the ADH filter assay described in 
Example 3, and also using the p-rosaniline assay 
described in Example 2 to estimate the efficiency of 
mutagenesis. 

5 After overnight treatment, only 3-4% plasmids 

treated with HA remained active. Plasmids treated by HA 
under conditions providing -50% of inactivation of the 
adh were then transformed into E. coli strain NM554 
(obtained from New England Biolabs) to obtain 500 - 700 
10 transformant colonies per plate. These colonies were 

analyzed by the nitrocellulose filter ADH assay described 
in Example 3. For heat inactivation of ADH, the filters 

were incubated for 15 minutes at 7 0 °C in a hybridization 
oven. 

15 Approximately 20,000 transf ormants were screened 

using this rapid method. Eighteen candidates were 
identified which appeared to show increased ADH 
thermotolerance. The candidates were purified and 
assayed on the same filter as control strains (i.e., 

20 strain XL1 containing the LADH + plasmid pBPP, and strain 
NM554 containing the LADH" plasmid pBluescript) . 

Based on results of the filter screening, none of 
the identified candidates appeared to have the 
temperature-resistant phenotype suggested by the results 

25 of the ADH filter assay. It is possible, however, that 
thermoresistant mutants can be obtained with HA upon 
further screening. Moreover, the chances of obtaining 
mutagenized adh resulting in enzyme thermostabilization 
might be further increased by excising the mutagenized 

30 gene from the vector, and resubcloning into a wild- type 
vector (i.e., a vector that^has not been treated with 
HA) , followed by screening . 

EXAMPLE 8 : PCR Mutagenesis of the adh gene 

35 This example describes PCR mutagenesis of the adh 

gene as a representative alcohol dehydrogenase gene. 
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To increase the efficiency of the cloning of 
mutagenized adh, primers for directional cloning were 
employed : 

CCC CGA ATT CTC AAA ACG TCA GGA TGG TAC G ADH(EcoRI) [SEQ 
5 ID NO: 21] 

CCC CTC TAG AAT AAA TGA GCA CAG CAG GAA AAG TAA TAA AAT 
GC 

ADH(Xbal) [SEQ ID NO: 22] 
The adh gene was amplified using these primers and cloned 

10 into a pGEM-T vector. 

For PCR mutagenesis two protocols were used, one 
according to Spee et al . (Spee et al . , Nucl . Acids Res. , 
21 , 777-778 (1993)), and another according to Rellos et 
al., (Rellos et al . , supra ) in which the limiting dNTP 

15 concentration was double that of the first procedure and 
dITP was not employed. The pGEM-T plasmid containing the 
adh gene was then used as a template for PCR mutagenesis 
of adh using standard T7 and SP6 primers to perform the 
error-prone PCR reaction under these conditions. 

20 Mutagenized adh- containing fragments were digested 

using Xbal and EcoRI enzymes, and subcloned into 
pBluescript SK to create a pBlue-ADH library. The 
resultant pBlue-ADH library (i.e., one library for each 
mutagenesis method performed) was transformed en masse 

25 into E. coli strain NM554 to allow the adh gene to be 
transcribed from the lac promoter. Transf ormants were 
then analyzed: (i) by PCR to determine the efficiency of 
cloning (% of the plasmids with and without insert) , and 
ii) by ADH filter assay to determine the efficiency of 

30 mutagenesis (% inactive ADH" clones). The results of 
these analyses are shown in Table 3 . 
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Table 3 . Mutant candidates identified 

Method of Percentage of the Percentage of the 
mutagenesis* plasmids with the ADH + clones 
insert 

Method No. 1 

Method No. 2 

No mutagenesis 
(wild-type adh) 



60% 
90% 
80% 



64% 
36% 
75% 



* Method No.l was done according to Spee et al . , supra , 
5 {i.e. with 14 of limiting dNTP and 200 fiM dITP) and 
Method No. 2 was done according to Rellos et al., supra 
(i.e. without dITP and with 25 /iM of the limiting dNTP) 



10 As can be seen from these results, both the cloning and 
mutagenesis efficiency was better using the second 
method. 

The transformants were then plated to a density of 
500 - 700 cells per plate and assayed on the filters 

15 under the same conditions described in the prior example 
for HA- mutagenesis of the adh gene. Approximately 5 , 000 
clones containing adh mutagenized by the first method, 
and the same number of clones mutagenized by the second 
method, were tested. No thermostable candidates from the 

20 first method were identified. By contrast, thirteen 

candidates were selected from clones mutagenized by the 
second method which appeared to possess an HLADH variant 
that was more stable than the wild-type enzyme. Upon 
restreaking and retesting these colonies by the filter 

25 assay method, nine of the thirteen candidates (i.e., 

plasmids pAD7, pAD8, pADIO, pAD91, pAD92 , pAD93 , pAD95, 
pADlll, and pAD113) were chosen for further 
characterization. 

These results confirm that PCR-mediated mutagenesis, 

30 particularly as described herein, can be employed to 
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obtain potential thermostable LADH variants. The results 
further indicate that the method can be employed to 
obtain other stabilized alcohol dehydrogenases, or other 
stabilized proteins. 

5 

EXAMPLE 9 : Characterization of thermotolerant 

HLADH candidates. 
This example describes a characterization for 
increased thermostability of mutants identified in the 
10 prior example. 

These experiments were done by calculating the 
residual HLADH activity at 70°C for a series of 
incubation periods. Residual activity is calculated as 
activity after incubation at a particular temperature 
15 divided by activity before incubation. Cultures of the 
mutant candidates as well as control cells harboring the 
wild- type HLADH* control plasmid pBPP and HLADH" negative 
control plasmid pGEM-T were grown in appropriate media, 
and cell extracts were made by sonication. The extracts 
20 were then incubated at 70 °C, taking an initial sample 

<t 0 ) , and sampling at about 30, 60, and 120 minutes. The 
samples were stored on ice, and the HLADH activity was 
determined spectrophotometrically as described in Example 
1. The data was plotted as a percentage of activity 
25 compared to the t 0 activity (residual activity) in order 
to compare the individual samples to each other and 
adjust for variations in expression levels or growth 
variations. 

Figure 4 displays the residual activity data for the 
30 nine candidate plasmids pAD7, pAD8 , pADIO, pAD91, pAD92, 
pAD93, pAD95, pADlll, and pAD113, wherein the t 0 activity 
is normalized to 1.00 (100%) . As can be seen from Figure 
4, all the mutants exhibited increased thermotolerance 
compared to cells containing plasmid pBPP, which contains 
35 the wild- type HLADH gene. In particular, plasmids pAD91, 
pAD92, and pADIO showed the most noticeable alterations 
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in thermostability. Cells containing pGEM-T (i.e., not 
having an HLADH gene) did not show any HLADH activity. 

These results thus confirm that the method of the 
invention can be employed to obtain thermostable alcohol 
5 dehydrase, particularly HLADH, mutants. 

Table 4 below provides data illustrating comparative 
data for HALDH activities in the original wild-type 
("WT") clone and mutants. All clones were grown in 50 ml 

10 of LB medium with 100 ^g/ml Amp (12.5 M9/ml Tet for WT 
clone) overnight, concentrated in 1 ml of the assay 
buffer (83 mM KH 2 P0 4 , 4 0 mM KC1, 0.25 mM EDTA) , sonicated 
and assayed with ethanol as a substrate and NAD cofactor, 
with results shown as U = mol/mg protein x 1000 / percent 

15 residual activity. 

Table 4 . HALDH Activity after Heat Treatment 

Heat Treatment time 



Strain 


RT 


15 min 


30 min 


60 min 


pADH7 


8/100% 


4/50% 


2/25% 


0.6/8% 


pADH8 


21/100% 


7.4/35% 


2/10% 


0.2/1% 


pADHIO 


16/100% 


4/25% 


1.4/9% 


0/0% 


pADH91 


11/100% 


8/73% 


6/55% 


4/36% 


pADH92 


25/100% 


15/60% 


17/68% 


12/48% 


pADH93 


6/100% 


1/17% 


2.5/42% 


0/0% 


pADH95 


66/100% 


21/32% 


10/15% 


3/5% 


pADHlll 


22/100% 


15/68% 


16/73% 


11/50% 


pADH113 


9/100% 


4/44% 


3/33% 


0.8/9% 


WT 


10/100% 


1/10% 


0.3/3% 


0/0% 



20 Table 5 below provides data illustrating comparative 

data for HALDH activities of the original wild- type 
{"WT") clone and mutants and substrate specificity. All 
clones were grown in 1 L of LB medium with 100 ixg/xrH Amp 
(12.5 ng/uil Tet for WT clone) overnight, concentrated in 

25 50 ml of the assay buffer (83 mM KH 2 P0 4 , 40 mM KCl , 0.25 
mM EDTA) , sonicated, incubated at 55°C for 5 min to 
denature the E.coli protiens and lyophilized. The assays 
were performed at room temperature with the listed 
substrate and NAD cofactor, with results shown as U = 

30 mol/mg protein x 1000. 
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Table 5 HLADH Substrate Specificity- 
Strain Ethanol Isopropanol Butanol Benzyl Alcol 


pADH7 


8.7 


0 


5.3 


1.4 


PADH8 


18.2 


1.4 


11 


7 


pADHIO 


15.6 


3 


11.5 


4.7 


pADH91 


13 .2 


1.1 


4.7 


3.4 


PADH92 


23 .5 


2.3 


11 


6.8 


PADH93 


5.6 


1 


3 


1.6 


PADH95 


48 


0.7 


21.3 


4.5 


pADHlll 


22 .6 


1.6 


9.8 


3 


PADH113 


7 


1.1 


3 


5 


WT 


9,2 


1.7 


7.6 


3.5 


Strain Hexanol Cyclohexanol R-(-)Butanol S-(+)Butanol 


pADH7 


4 


3 


0 


0 


PADH8 


15 


49 


2.4 


2.2 


pADHIO 


15 


69 


10 


4 


pADH91 


5.8 


23 


2.4 


1.7 


PADH92 


10 .6 


50 


2.3 


2.4 


pADH93 


3.9 


22 


2 


1.4 


PADH95 


21.3 


16.5 


0.5 


0.8 


pADHlll 


9.4 


58 


4 


2.7 


PADH113 


2.7 


14.7 


2 


1.3 


WT 


10 


42 


4.3 


2.9 



5 



EXAMPLE 10 : Sequence Analysis of HLADH 
Thermotolerant Candidates 
This examples describes the sequencing of the 
10 mutagenized adh genes. 

The inserts of plasmids containing the mutagenized 
adh gene were sequenced using an ABI DNA sequencer, and 
compared to the sequence of the wild. type protein. The 
translated nucleic acid/amino acid sequence for plasmids 
15 having the wild-type or mutant adh genes is given in 

Figure 5 f with the positions of the non-silent mutations 
(i.e., those that change the encoded amino acid) 
indicated by the boxes. Table 6 summarizes all the 
nucleic acid mutations and the respective amino acid 
20 changes, if any, introduced by the mutations. 
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Table 6. Mutations identified in thermotolerant 
candidates 



Mutant 
plasmid 


Base Amino 
Pair acid 
position position 


Original. Mutant Amino 
a codon codon acid 

rhanrro2 


pAD7 


/ /4 


257 


ATG 


ATA 


Met257Ile 




878 


292 


GTG 


GCG 


Val292Ala 


pAD8 


285 


94 


ACT 


ACC 


no aa 
change 




806 


268 


GTC 




Val268Ala 


pADIO 


227 


75 


AGC 


AAC 


Ser75Asn 


pAD91/92 


284 


94 


ACT 


ATT 


Tftr94Ile 


pAD93 


ft ZL 1 


282 


TGT 


AGT 


Cys282Ser , 






297 


GAT 


GGT 


Asp2 97Gly 




774 


257 


ATG 


ATA 


Met257Ile 




878 


292 


GTG 


GCG 


Val292Ala 


pADlll 


532 


177 


TCT 


ACT 


seri77Thr 


PAD113 


129 


42 


GCC 


GCT 


no aa 
change 




159 


52 


GTG 


GTA 


no aa 
change 




331 


110 


TTC 


CTC 


PhellOLeu 



15 



Also, the individual sequences of the mutant adh 
sequences are set forth in the Sequence Listing for pAD7 
(i.e., nucleic acid sequence at SEQ ID NO: 3 and amino 
acxd sequence at SEQ ID NO:4) , P AD8 (i.e., nucleic acid 
sequence at SEQ ID NO: 5 and amino acid sequence at SEQ ID 
NO: S), pADIO (i.e., nucleic acid sequence at SEQ ID NO-7 
and amino acid sequence at SEQ ID NO: 8) , pAD9l/ pA D92 
(i.e., nucleic acid sequence at SEQ ID NO:9 and amino 
acid sequence at SEQ ID NO:10), pAD93 (i.e., nucleic acid 
sequence at SEQ ID NOM1 and amino acid sequence at SEQ 
ID NO:12), pAD95 (i.e., nucleic acid sequence at SEQ ID 
NO: 13 and amino acid sequence at SEQ ID NO : 14 ) , pADlll 
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(i.e., nucleic acid sequence at SEQ ID NO:15 and amino 
acid sequence at SEQ ID NO:16), and pAD113(i.e., nucleic 
acid sequence at SEQ ID NO: 17 and amino acid sequence at 
SEQ ID NO: 18) . 

5 The first numbered amino acid in the wild- type and 

mutant sequences is serine since, in the sequences 
studied, the initial methionine (Met) is not present in 
the final protein. However, it is possible that Met is 
present in the wild-type (or mutant) HLADH sequences that 

10 are produced in a different host, e.g., in a eukaryotic 
host, or when transcribed and translated from a different 
plasmid construct or chromosome. 

As can be seen from this data, the sequences of 
pAD91 and pAD92 are identical, which indicates the clones 

15 from which the DNA was isolated likely are siblings. 
Mutants containing plasmids pAD91, PAD92, pAD93 , and 
pAD95 were identified from the same filter and mutants 
containing plasmids pADlll and pAD113 were identified 
from the same filter assay. Also, in both pAD8 and 

20 pAD91/92, the coding sequence specifying amino acid 94 is 
mutated. Whereas this results in no change in this 
position in pADB, a mutation is introduced here in' 
pAD91/92. Similarly, two mutations in pAD113 are silent 
and do not produce an amino acid change. These silent 

25 mutations likely do not contribute substantially to the 
thermostability of the protein. 

EXAMPLE 11 : Further thermostabilization 
of HLADH proteins 

30 This example describes the means by which the 

thermostable proteins identified and characterized as in 
the prior examples can be further thermostabilized. 

Using the new mutants as a starting point, the 
process applied here can be reiterated to increase the 

35 thermostability of the HLADH enzyme even further. 
Namely, it is expected that combinations of the 
identified HLADH mutations or, combinations of these 
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mutations with other HLADH mutations, can further 
thermostabilize the enzyme. 

In order to do this, the new thermoinactivation 
limits need to be defined as described in Example 3. 
5 This is followed by a new round of mutagenesis performed 
as described in Examples 8, 9, and 10. In addition, the 
identified mutations can be put together in differing 
combinations by in vitro site-directed mutagenesis and 
further molecular biology methods (see, e.g., Sambrook et 
10 al . , Molecular Cloning: A Laboratory Manual (Cold Spring 
Harbor Laboratory Press, NY. 1989)) that include DNA 
shuffling via PCR methods (Stemmer et al . , Proc. Natl. 
Acad. Sci. , 91, 10747-10751 (1994a); Stemmer et al, , 
Nature , 340, 389-391 (1994b)). As they have done in the 
15 past, these methods are all expected to give further 

increases in the levels of thermostability of the enzyme 
or, in another similarly screened-for trait. 

All of the references cited herein, including 
patents, patent applications, sequences, and 
20 publications, are hereby incorporated in their entireties 
by reference. 

While this invention has been described with an 
emphasis upon preferred embodiments, it will be obvious 
to those of ordinary skill in the art that variations in 
25 the preferred embodiments can be used, including 

variations due to improvements in the art, and that the 
invention can be practiced otherwise than as specifically 
described herein. Accordingly, this invention includes 
all modifications encompassed within the spirit and scope 
30 of the invention as defined by the following claims. 
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SEQUENCE LISTING 



10 



15 



35 



45 



50 



(1) GENERAL INFORMATION: 

(i) APPLICANT: DAVID C DEMIRJIAN 
IGOR A. BRIKUN 
MALCOLM J. CASADABAN 
VERONIKA VONSTEIN 

<ii> TITLE OF INVENTION: Method For The Stabilization Of Proteins And The 
Thermoatabilized Alcohol Dehydrogenases produced Thereby 

(iii) NUMBER OF SEQUENCES: 24 



(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE : Mcdonald Boehnen Hulbert & Berghoff 

(B) STREET: 300 South Wacker Drive 
<C> CITY: Chicago 

20 (D) STATE : Illinois 

(E) COUNTRY : United States 

(F) ZIP: 60606 

(v) COMPUTER READABLE FORM: 
25 (A) MEDIUM TYPE : Floppy disk 

(B) COMPUTER: IBM PC compatible 

tC) OPERATING SYSTEM: PC -DOS /MS -DOS 

ID) SOFTWARE: Patent In Release #1.0, Version #1.30 

30 (vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION : 



80 



(2) INFORMATION FOR SEQ ID NO:l: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1128 base pairs 
40 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 



ATG AGC ACA GCA GGA AAA GTA ATA AAA TGC AAA GCG GCT GTG CTG TGG 
Ser Thr Ala Gly Lys Val He Lys Cys Lys Ala Ala Val Leu Trp 

i 5 10 15 



GAG GAA AAG AAA CCA TTT TCC ATC GAG GAG GTG GAG GTT GCA CCC CCG 96 
Glu Glu Lys Lys Pro Phe Ser He Glu Glu Val Glu Val Ala Pro Pro 
55 20 25 30 

AAG GCC CAT GAA GTC CGT ATA AAG ATG GTG GCC ACA GGA ATT TGT CGC 144 
Lys Ala His Glu Val Arg He Lys Met Val Ala Thr Gly He Cys Arg 
1 35 40 45 

^ TCA GAT GAC CAC GTG GTT AGT GGA ACC CTT GTC ACA CCT CTT CCT GTG 192 
Ser Asp Asp His Val Val Ser Gly Thr Leu Val Thr Pro Leu Pro Val 
50 SS 60 

ATC GCA GGC CAT GAG GCA GCG GGC ATT GTG GAG AGC ATT GGA GAA GGC 240 
He Ala Gly His Glu Ala Ala Gly He Val Glu Ser He Gly Glu Gly 
65 70 75 

GTC ACT ACA GTA AGA CCA GGT GAT AAA GTC ATC CCA CTC TTT ACT CCC 288 
70 Val Thr Thr Val Arg Pro Gly Asp Lys Val He Pro Leu Phe Thr Pro 
80 85 90 9S 

CAG TGT GGA AAA TGC AGG GTT TGT AAG CAC CCT GAA GGC AAC TTC TGC 336 
Gin Cys Gly Lys Cys Arg Val Cys Lys His Pro Glu Gly Asn Phe Cys 
75 100 105 no 

TTG AAA AAT GAT CTG AGC ATG CCT CGG GGA ACC ATG CAG GAT GGT ACC 384 
Leu Lys Asn Asp Leu Ser Met Pro Arg Gly Thr Met Gin Asp Gly Thr 
H5 120 125 



AGC AGG TTC ACC TGC AGA GGG AAG CCC ATC CAC CAC TTC CTT GGC ACC 432 
Ser Arg Phe Thr Cys Arg Gly Lys Pro He His His Phe Leu Gly Thr 
130 135 140 
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15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



70 



75 



80 
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AGC ACC TTC TCC CAG TAC ACC GTG GTG GAC GAG ATC TCA GTG GCC AAG 
Ser Thr Phe Ser Gin Tyr Thr Val Val Asp oiu lie Ser val Ala Lys 
145 150 155 

ATC GAT GCG GCC TCA CCG CTG GAG AAA GTC TGT CTC ATT GGC TGT GGA 
lie Asp Ala Ala Ser Pro Leu Glu Lys Val Cys Leu He Gly Cys Gly 
160 170 17 5 

TTT TCT ACT GGT TAT GGG TCT GCA GTC AAG GTT GCC AAG GTC ACC CAG 
Phe Ser Thr Gly Tyr Gly Ser Ala Val Lys Val Ala Lys Val Thr Gin 

ifio ies 190 

GGC TCC ACC TGT GCC GTG TTT GGC CTT GGA GGA GTG GGC CTG TCT GTT 
Gly Ser Thr Cys Ala Val Phe Gly Leu Gly Gly Val Gly Leu Ser Val 
195 200 205 

t?S 21? S?f 7" f** G ? A GCC GGA GCG GCC AGG ATC ATT GGG GTG GAC 
He Met Gly Cys Lys Ala Ala Gly Ala Ala Arg lie He Gly Val Asp 

2X0 215 220 

A ?f ^ GAC ^ GCA AAG GCC AAA GAA GTG GGT GCC ACT GAG 

lie Asa Lys Asp Lys Phe Ala Lys A l a Lys Glu Val Gly Ala Thr Glu 
" a 230 235 

rtl SI? ^ S CT S AG ^ TAC ^ *** CCC ATC CAG GAG GTG CTG ACA 
Cys Val Asn Pro Gin Asp Tyr Lys Lys Pro lie Gin Glu Val Leu Thr 

240 245 250 255 

GAA ATG AGC AAT GGA GGT GTG GAT TTT TCC TTT GAA GTC ATT GGT rrr 
Glu Met Ser Asn Gly Gly Val Asp Phe Ser SI ctS vll l7e £?y £g 
260 265 270 

^ C GAC ACT ATG GTG ACT GCC TTG TCA TGC TGT CAA GAA GCA TAT GGT 
Leu Asp Thr Met Val Thr Ala Leu Ser Cys Cys Gin Glu aT* EE Gly 
275 280 285 

GTG AGC GTC ATT GTG GGA GTA CCT CCT GAT TCC CAA AAT CTC TCT am 
Val Ser Val U e val Gly Val Pro Pro Asp S£ Gin £n Ser 
290 295 300 

AAT CCT ATG TTG CTA CTG AGT GGA CGT ACC TGG AAA GGA GCT ATT TTT 
Asn Pro Met Leu Leu Leu Ser Gly Arg Thr Trp Lys el* aS He 
310 315 

GGC GGT TTT AAG AGT AAA GAT TCT GTC CCC AAA CTT GTG GCC GAT TTT 
Gly Gly Phe Lys Ser Lys Asp Ser Val Pro Lys Leu Val Ala Asp 
320 325 330 335 

ATG GCT AAA AAG TTT GCA CTG GAT CCT TTA ATC ACC CAT GTT TTA CCT 
Met Ala Lys Lys Phe Ala Leu Asp Pro Leu lie Thr His Val Leu Pro 
340 345 350 

TTT GAA AAA ATA AAT GAA GGA TTT GAC CTG CTT CGC TCT GGA GAG AGT 
Phe Glu Lys lie Asn Glu Gly Phe Asp Leu Leu Arg Ser Gly Glu Ser 

355 260 365 

ATC CGT ACC ATC CTG ACG TTT TGA 
He Arg Thr He Leu Thr Phe 
370 

(2) INFORMATION FOR SEQ ID NO: 2: 

(ij SEQUENCE CHARACTERISTICS: 

(A) LENGTH i 374 amino acids 
{B> TYPE: amino acid 
(D> TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Ser Thr Ala Gly Lys Val He Lys Cys Lys Ala Ala Val Leu Trp Glu 
1 5 io £ 

Glu Lys Lys Pro Phe Ser lie Glu Glu Val Glu Val Ala Pro Pro Lys 
20 25 30 y 

Ala His Glu Val Arg He Lys Met Val Ala Thr Gly He Cys Arg Ser 
J -* 40 45 

Asp Asp His Val Val Ser Gly Thr Leu Val Thr Pro Leu Pro Val He 
55 60 



480 



576 



624 



672 



816 



864 



912 



960 



1008 



1056 



1128 
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Ala Gly His Glu Ala Ala Gly lie Val Glu Ser lie Gly Glu Gly Val 
65 70 75 80 

Thr Thr Val Arg Pro Gly Asp Lys Val lie Pro Leu Phe Thr Pro Gin 
85 90 95 

Cye Gly Lys Cys Arg Val Cys Lys His Pro Glu Gly Asn Phe Cys Leu 
100 105 110 

Lys Asn Asp Leu Ser Met Pro Arg Gly Thr Met Gin Asp Gly Thr Ser 
115 120 125 

Arg Phe Thr Cys Arg Gly Lys Pro lie His His Phe Leu Gly Thr Ser 
130 135 140 

Thr Phe Ser Gin Tyr Thr Val Val Asp Glu lie Ser Val Ala Lys He 
145 150 155 160 

Asp Ala Ala Ser Pro Leu Glu Lye Val Cya Leu He Gly Cys Gly Phe 
165 170 175 

Ser Thr Gly Tyr Gly Ser Ala Val Lys Val Ala Lys Val Thr Gin Gly 
180 185 190 

Ser Thr Cys Ala Val Phe Gly Leu Gly Gly Val Gly Leu Ser Val He 
195 200 205 

Met Gly Cys Lys Ala Ala Gly Ala Ala Arg He He Gly Val Asp He 
210 215 220 

Asn Lys Asp Lys Phe Ala Lys Ala Lys Glu Val Gly Ala Thr Glu Cys 
225 230 23S 240 

Val Asn Pro Gin Asp Tyr Lys Lys Pro He Gin Glu Val Leu Thr Glu 
245 250 255 

Met Ser Asn Gly Gly Val Asp Phe Ser Phe Glu Val He Gly Arg Leu 
260 265 270 

Asp Thr Met Val Thr Ala Leu Ser Cys Cys Gin Glu Ala Tyr Gly Val 
275 280 28S 

Ser Val He Val Gly Val Pro Pro Asp Ser Gin Asn Leu Ser Met Asa 
290 295 300 

Pro Met Leu Leu Leu Ser Gly Arg Thr Trp Lys Gly Ala lie Phe Gly 
305 310 315 320 

Gly Phe Lys Ser Lys Asp Ser Val Pro Lys Leu Val Ala Asp Phe Met 
325 330 335 

Ala Lys Lys Phe Ala Leu Asp Pro Leu He Thr His Val Leu Pro Phe 
340 345 350 

Glu Lys He Asn Glu Gly Phe Asp Leu Leu Arg Ser Gly Glu Ser He 
355 360 365 



Arg Thr He Leu Thr Phe 
370 



(2) INFORMATION FOR SEQ ID NO: 3: 

<i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 1128 base pairs 
(BJ TYPE : nucleic acid 
(CI STRANDEDNESS : double 
(DJ TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

ATG AGC ACA GCA GGA AAA GTA ATA AAA TGC AAA GCG GCT GTG CTG TGG 
Ser Thr Ala Gly Lys Val He Lys Cys Lys Ala Ala Val Leu Trp 
15 10 15 

GAG GAA AAG AAA CCA TTT TCC ATC GAG GAG GTG GAG GTT GCA CCC CCG 
Glu Glu Lys Lys Pro Phe Ser He Glu Glu Val Glu Val Ala Pro Pro 
20 25 30 

AAG GCC CAT GAA GTC CGT ATA AAG ATG GTG GCC ACA GGA ATT TGT CGC 
Lys Ala His Glu Val Arg He Lys Met Val Ala Thr Gly He Cys Arg 
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35 40 45 

TCA GAT GAC CAC GTG GTT AGT GGA ACC CTT GTC ACA CCT CTT CCT GTG 192 
Ser Asp Asp His Val Val Ser Gly Thr Leu Val Thr Pro Leu Pro Val 
50 55 60 

ATC GCA GGC CAT GAG GCA GCG GGC ATT GTG GAG AGC ATT GGA GAA GGC 240 
lie Ala Gly Hie Glu Ala Ala Gly He Val Glu Ser He Gly Glu Gly 
65 70 75 

GTC ACT ACA GTA AGA CCA GGT GAT AAA GTC ATC CCA CTC TTT ACT CCC 2B8 
Val Thr Thr Val Arg Pro Gly Asp Lys Val lie Pro Leu Phe Thr Pro 
80 65 90 95 

15 CAG TGT GGA AAA TGC AGG GTT TGT AAG CAC CCT GAA GGC AAC TTC TGC 336 
Gin Cys Gly Lys Cys Arg Val Cys Lys His Pro Glu Gly Asn Phe Cys 
100 105 110 

TTG AAA AAT GAT CTG AGC ATG CCT CGG GGA ACC ATG CAG GAT GGT ACC 384 
20 Leu Lys Asn Asp Leu Ser Met Pro Arg Gly Thr Met Gin Asp Gly Thr 
115 120 125 

AGC AGG TTC ACC TGC AGA GGG AAG CCC ATC CAC CAC TTC CTT GGC ACC 432 
Ser Arg Phe Thr Cys Arg Gly Lys Pro He His His Phe Leu Gly Thr 
25 130 135 140 

AGC ACC TTC' TCC CAG TAC ACC GTG GTG GAC GAG ATC TCA GTG GCC AAG 480 
Ser Thr Phe Ser Gin Tyr Thr Val Val Asp Glu He Ser Val Ala Lys 
145 150 155 



30 



50 



ATC GAT GCG GCC TCA CCG CTG GAG AAA GTC TGT CTC ATT GGC TGT GGA 
He Asp Ala Ala Ser Pro Leu Glu Lys Val Cys Leu He Gly Cys Gly 
160 16S . 170 175 ■ 



35 TTT TCT ACT GGT TAT GGG TCT GCA GTC AAG GTT GCC AAG GTC ACC CAG 576 

Phe Ser Thr Gly Tyr Gly Ser Ala Val Lys Val Ala Lys Val Thr Gin 
180 185 190 

GGC TCC ACC TGT GCC GTG TTT GGC CTT GGA GGA GTG GGC CTG TCT GTT 624 

40 Gly Ser Thr Cys Ala Val Phe Gly Leu Gly Gly Val Gly Leu Ser Val 
195 200 205 

ATC ATG GGC TGT AAA GCA GCC GGA GCG GCC AGG ATC ATT GGG GTG GAC 672 

He Met Gly Cys Lye Ala Ala Gly Ala Ala Arg He He Gly Val Asp 
45 210 215 220 



ATC AAC AAA GAC AAG TTT GCA AAG GCC AAA GAA GTG GGT GCC ACT GAG 720 
He Asn Lya Aap Lys Phe Ala Lys Ala Lys Glu Val Gly Ala Thr Glu 
225 230 235 

TGT GTC AAC CCT CAG GAC TAC AAG AAA CCC ATC CAG GAG GTG CTG ACA 768 
Cys Val Asn Pro Gin Asp Tyr Lys Lys Pro He Gin Glu Val Leu Thr 
240 245 250 255 

55 GAA ATA AGC AAT GGA GGT GTG GAT TTT TCC TTT GAA GTC ATT GGT CGG 816 
Glu He Ser Asn Gly Gly Val Asp Phe Ser Phe Glu Val He Gly Arg 
260 265 270 

CTC GAC ACT ATG GTG ACT GCC TTG TCA TGC TGT CAA GAA GCA TAT GGT 864 
60 Leu Asp Thr Met Val Thr Ala Leu Ser Cys Cys Gin Glu Ala Tyr Gly 
275 280 285 

GTG AGC GTC ATT GCG GGA GTA CCT CCT GAT TCC CAA AAT CTC TCT ATG 912 
Val Ser Val He Ala Gly Val Pro Pro Asp Ser Gin Asn Leu Ser Met 
65 290 295 300 

AAT CCT ATG TTG CTA CTG AGT GGA CGT ACC TGG AAA GGA GCT ATT TTT 960 

Asn Pro Met Leu Leu Leu Ser Gly Arg Thr Trp Lys Gly Ala He Phe 
305 310 315 

70 

GGC GGT TTT AAG AGT AAA GAT TCT GTC CCC AAA CTT GTG GCC GAT TTT 1008 

Gly Gly Phe Lys Ser Lys Asp Ser Val Pro Lys Leu Val Ala Asp Phe 
320 325 330 335 

75 ATG GCT AAA AAG TTT GCA CTG GAT CCT TTA ATC ACC CAT GTT TTA CCT 1056 
Met Ala Lys Lys Phe Ala Leu Asp Pro Leu lie Thr His Val Leu Pro 
340 345 350 

TTT GAA AAA ATA AAT GAA GGA TTT GAC CTG CTT CGC TCT GGA GAG AGT 1104 
80 Phe Glu Lys He Asn Glu Gly Phe Asp Leu Leu Arg Ser Gly Glu Ser 
355 360 365 

ATC CGT ACC ATC CTG ACG TTT TGA 1128 
He Arg Thr He Leu Thr Phe 
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370 



(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 374 amino acidr. 

(B) TYPE: amino acid 
(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 

Ser Thr Ala Gly Lye Val He Lya Cys Lya Ala Ala Val Leu Trp Glu 
1 S 10 15 

Glu Lya Lys Pro Phe Ser He Glu Glu Val Glu Val Ala Pro Pro Lya 
20 25 30 

Ala His Glu Val Arg He Lys Met Val Ala Thr Gly He Cys Arg Ser 
35 40 45 

Asp Asp His Val Val Ser Gly Thr Leu Val Thr Pro Leu Pro Val He 
50 55 €0 

Ala Gly His Glu Ala Ala Gly He Val Glu Ser He Gly Glu Gly Val 
65 70 75 80 

Thr Thr Val Arg Pro Gly Asp Lys Val He Pro Leu Phe Thr Pro Gin 
85 90 95 

Cys Gly Lys Cys Arg Val Cys Lys His Pro Glu Gly Asn Phe Cys Leu 
100 105 110 

Lya Asn Asp Leu Ser Met Pro Arg Gly Thr Met Gin Asp Gly Thr Ser 
115 120 125 

Arg Phe Thr Cys Arg Gly Lys Pro He His His Phe Leu Gly Thr Ser 
130 135 140 

Thr Phe Ser Gin Tyr Thr Val Val Asp Glu He Ser Val Ala Lys lie 
145 150 155 160 

Asp Ala Ala Ser Pro Leu Glu Lys Val Cys Leu He Gly Cys Gly Phe 
165 170 175 

Ser Thr Gly Tyr Gly Ser Ala Val Lys Val Ala Lys Val Thr Gin Gly 
180 185 190 

Ser Thr Cys Ala Val Phe Gly Leu Gly Gly val Gly Leu Ser Val He 
195 200 205 

Met Gly Cys Lys Ala Ala Gly Ala Ala Arg He He Gly Val Asp He 
210 215 220 

Asn Lys Asp Lys Phe Ala Lys Ala Lys Glu Val Gly Ala Thr Glu Cys 
225 230 235 240 

Val Asn Pro Gin Asp Tyr Lys Lys Pro He Gin Glu Val Leu Thr Glu 



Asp Thr Met Val Thr Ala Leu Ser Cys Cys Gin Glu Ala Tyr Gly Val 
275 260 285 

Ser Val He Ala Gly Val Pro Pro Asp Ser Gin Asn Leu Ser Met Asn 
290 295 300 

Pro Met Leu Leu Leu Ser Gly Arg Thr Trp Lys Gly Ala He Phe Gly 
305 310 315 320 

Gly Phe Lya Ser Lys Asp Ser Val Pro Lys Leu Val Ala Asp Phe Met 
325 330 335 

Ala Lys Lys Phe Ala Leu Asp Pro Leu He Thr His Val Leu Pro Phe 
340 345 350 

Glu Lys He Asn Glu Gly Phe Asp Leu Leu Arg Ser Gly Glu Ser He 
355 360 365 

Arg Thr He Leu Thr Phe 



245 



250 



255 



He Ser Asn Gly Gly Val Asp Phe Ser 
260 265 



Phe Glu Val He Gly Arg Leu 
270 
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(2) INFORMATION FOR SEQ ID NO-.5: 

il) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 1128 base pairs 

(B) TYPE: nucleic acid 
1A (C) STRANDEDNESS ; double 
10 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
15 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5.: 



20 



40 



60 



80 



ATG AGC ACA GCA GGA AAA GTA ATA AAA TGC AAA GCG GCT GTG CTG TGG 48 
Ser Thr Ala Gly Lys Val He Lys Cys Lys Ala Ala Val Leu Trp 
1 5 10 15 

GAG GAA AAG AAA CCA TTT TCC ATC GAG GAG GTG GAG GTT GCA CCC CCG 96 
Glu Glu Lys Lys Pro Phe Ser He Glu Glu Val Glu Val Ala Pro Pro 
20 25 30 



25 AAG GCC CAT GAA GTC CGT ATA AAG ATG GTG GCC ACA GGA ATT TGT CGC 
Lys Ala His Glu Val Arg He Lys Met Val Ala Thr Gly He Cys Arg 
35 40 45 

TCA GAT GAC CAC GTG GTT AGT GGA ACC CTT GTC ACA CCT CTT CCT GTG 
JU Ser Asp Asp His Val Val Ser Gly Thr Leu Val Thr Pro Leu Pro Val 
50 55 60 

ATC GCA GGC CAT GAG GCA GCG GGC ATT GTG GAG AGC ATT GGA GAA GGC 

" la Gly He Val Glu Ser 

70 75 



He Ala Gly His Glu Ala Ala Gly He Val Glu Ser He Gly Glu Gly 
jj 65 



GTC ACT ACA GTA AGA CCA GGT GAT AAA GTC ATC CCA CTC TTT ACC CCC 288 
Val Thr Thr Val Arg Pro Gly Asp Lys Val He Pro Leu Phe Thr Pro 
80 85 90 95 

CAG TGT GGA AAA TGC AGG GTT TGT AAG CAC CCT GAA GGC AAC TTC TGC 336 
Gin Cys Gly Lys Cys Arg Val Cys Lys His Pro Glu Gly Asn Phe Cys 
100 105 110 

45 TTG AAA AAT GAT CTG AGC ATG CCT CGG GGA ACC ATG CAG GAT GGT ACC 384 
Leu Lys Asn Asp Leu Ser Met Pro Arg Gly Thr Met Gin Asp Gly Thr 
115 120 125 

AGC AGG TTC ACC TGC AGA GGG AAG CCC ATC CAC CAC TTC CTT GGC ACC 432 
DU Ser Arg Phe Thr Cys Arg Gly Lys Pro He His His Phe Leu Gly Thr 
130 135 140 

AGC ACC TTC TCC CAG TAC ACC GTG GTG GAC GAG ATC TCA GTG GCC AAG 480 
Ser Thr Phe Ser Gin Tyr Thr Val Val Asp Glu He Ser Val Ala Lys 
->3 145 150 155 

ATC GAT GCG GCC TCA CCG CTG GAG AAA GTC TGT CTC ATT GGC TGT GGA 52 B 

He Asp Ala Ala Ser Pro Leu Glu Lys Val Cya Leu He Gly Cys Gly 
160 165 170 175 

TTT TCT ACT GGT TAT GGG TCT GCA GTC AAG GTT GCC AAG GTC ACC CAG 576 
Phe Ser Thr Gly Tyr Gly Ser Ala Val Lys Val Ala Lys Val Thr Gin 
180 185 190 

65 GGC TCC ACC TGT GCC GTG TTT GGC CTT GGA GGA GTG GGC CTG TCT GTT 624 
Gly Ser Thr Cys Ala Val Phe Gly Leu Gly Gly Val Gly Leu Ser Val 
195 200 205 

ATC ATG GGC TGT AAA GCA GCC GGA GCG GCC AGG ATC ATT GGG GTG GAC 672 
fU He Met Gly Cys Lys Ala Ala Gly Ala Ala Arg He He Gly Val Asp 
210 215 220 

ATC AAC AAA GAC AAG TTT GCA AAG GCC AAA GAA GTG GGT GCC ACT GAG 720 
He Asn Lys Asp Lys Phe Ala Lys Ala Lys Glu Val Gly Ala Thr Glu 
75 225 230 235 

TGT GTC AAC CCT CAG GAC TAC AAG AAA CCC ATC CAG GAG GTG CTG ACA 768 
Cys Val Asn Pro Gin Asp Tyr Lys Lys Pro He Gin Glu Val Leu Thr 
240 245 250 255 

GAA ATG AGC AAT GGA GGT GTG GAT TTT TCC TTT GAA GCC ATT GGT CGG 816 
Glu Met Ser Asn Gly Gly Val Asp Phe Ser Phe Glu Ala He Gly Arg 
260 265 270 
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(ii) MOLECULB TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO; 6: 

Ser Thr Ala Gly Lys Val He Lys Cys Lys Ala Ala Val Leu Trp Glu 
! S 10 IS 

Glu Lys Lys Pro Phe Ser lie Glu Glu Val Glu Val Ala Pro Pro Lya 
45 20 25 30 

Ala His Glu Val Arg He Lys Met Val Ala Thr Gly He Cys Arg Ser 
35 40 45 

50 Asp Asp His Val. Val Ser Gly Thr Leu Val Thr Pro Leu Pro Val He 



55 



Ala Gly His Glu Ala Ala Gly He Val Glu Ser He Gly Glu Gly Val 
65 70 75 80 

Thr Thr Val Arg Pro Gly Asp Lys Val He Pro Leu Phe Thr Pro Gin 
8S 90 95 

Cys Gly Lys Cys Arg Val Cys Lys His Pro Glu Gly Asn Phe Cys Leu 
60 100 105 no 

Lya Asn Asp Leu Ser Met Pro Arg Gly Thr Met Gin Asp Gly Thr Ser 
115 120 125 

65 Arg Phe Thr Cys Arg Gly Lys Pro He His His Phe Leu Gly Thr Ser 
130 135 140 

Thr Phe Ser Gin Tyr Thr Val Val Asp Glu He Ser Val Ala Lys He 
145 150 155 ISO 

Aap Ala Ala Ser Pro Leu Glu Lys Val Cys Leu He Gly Cys Gly Phe 
165 170 175 

Ser Thr Gly Tyr Gly Ser Ala Val Lys Val Ala Lys Val Thr Gin Gly 
75 180 185 190 

Ser Thr Cys Ala Val Phe Gly Leu Gly Gly Val Gly Leu Ser Val He 
195 200 205 

80 Met Gly Cys Lys Ala Ala Gly Ala Ala Arg He He Gly Val Asp He 
210 215 220 

Asn Lys Aap Lya Phe Ala Lys Ala Lys Glu Val Gly Ala Thr Glu Cys 
225 230 235 240 



70 



1008 



39 

CTC GAC ACT ATG GTG ACT GCC TTG TCA TGC TGT CAA GAA GCA TAT GGT 864 
Leu Asp Thr Met Val Thr Ala Leu Ser Cys Cys Gin Glu Ala Tyr Gly 
275 280 285 

5 GTG AGC GTC ATT GTG GGA GTA CCT CCT GAT TCC CAA AAT CTC TCT ATG 912 
val Ser Val He Val Gly Val Pro Pro Asp Ser Gin Asn Leu Ser Met 
290 295 300 

AAT CCT ATG TTG CTA CTG AGT GGA CGT ACC TGG AAA GGA GCT ATT TTT 960 
10 Asn Pro Met Leu Leu Leu Ser Gly Arg Thr Trp Lya Gly Ala He Phe 
305 310 315 

GGC GGT TTT AAG AGT AAA GAT TCT GTC CCC AAA CTT GTG GCC GAT TTT 
Gly Gly Phe Lys Ser Lys Asp Ser Val Pro Lyo Leu Val Ala Asp Phe 
15 320 325 330 335 

ATG GCT AAA AAG TTT GCA CTG GAT CCT TTA ATC ACC CAT GTT TTA CCT 1056 
Met Ala Lya Lys Phe Ala Leu Aap Pro Leu He Thr His Val Leu Pro 
340 345 350 

TTT GAA AAA ATA AAT GAA GGA TTT GAC CTG CTT CGC TCT GGA GAG AGT 1104 
Phe Glu Lys He Asn Glu Gly Phe Asp Leu Leu Arg Ser Gly Glu Ser 
355 360 365 

25 ATC CGT ACC ATC CTG ACG TTT TGA 1128 
He Arg Thr He Leu Thr Phe 
370 

30 (2) INFORMATION FOR SEQ ID NO: 6 : 

(i) SEQUENCE CHARACTERISTICS: 

IA) LENGTH: 374 amino acids 
(B) TYPE : amino acid 
35 CD) TOPOLOGY : linear 
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Val Asn Pro Gin Asp Tyr Lys Lys Pro lie Gin Glu Val 
245 250 

Met Ser Asn Gly Gly Val Asp Phe Ser Phe Glu Ala lie 
260 265 

Asp Thr Met Val Thr Ala Leu Ser Cya Cys Gin Glu Ala 
275 280 285 

Ser Val He Val Gly Val Pro Pro Asp Ser Gin Asn Leu 
290 295 300 

Pro Met Leu Leu Leu Ser Gly Arg Thr Trp Lys Gly Ala 
305 310 315 

Gly Phe Lys Ser Lys Asp Ser Val Pro Lys Leu Val Ala 
325 330 

Ala Lys Lys Phe Ala Leu Asp Pro Leu He Thr Hie Val 
340 34S 



Leu Thr Glu 
255 

Gly Arg Leu 
270 

Tyr Gly Val 
Ser Met Asn 



He Phe Gly 
320 

Asp Phe Met 
335 

Leu Pro Phe 

350 

Glu Lys He Asn Glu Gly Phe Asp Leu Leu Arg Ser Gly Glu Ser lie 
355 360 365 

Arg Thr He Leu Thr Phe 
370 



{2) INFORMATION FOR SEQ ID NO: 7: 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 112 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS ; double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 7: 

ATG AGC ACA GCA GGA AAA GTA ATA AAA TGC AAA GCG GCT GTG CTG TGG 
Ser Thr Ala Gly Lys Val He Lys Cys Lys Ala Ala Val Leu Trp 
1 5 10 is 

GAG GAA AAG AAA CCA TTT TCC ATC GAG GAG GTG GAG GTT GCA CCC CCG 
Glu Glu Lys Lys Pro Phe Ser He Glu Glu Val Glu Val Ala Pro Pro 
20 25 .30 

AAG GCC CAT GAA GTC CGT ATA AAG ATG GTG GCC ACA GGA ATT TGT CGC 
Lys Ala His Glu Val Arg He Lys Met Val Ala Thr Gly He Cys Art? 
35 40 45 

TCA GAT GAC CAC GTG GTT AGT GGA ACC CTT GTC ACA CCT CTT CCT GTG 
Ser Asp Asp His Val Val Ser Gly Thr Leu Val Thr Pro Leu Pro Val 
50 55 60 

ATC GCA GGC CAT GAG GCA GCG GGC ATT GTG GAG AAC ATT GGA GAA GGC 
He Ala Gly His Glu Ala Ala Gly He Val Glu Asn He Gly Glu Gly 
65 70 75 

GTC ACT ACA GTA AGA CCA GGT GAT AAA GTC ATC CCA CTC TTT ACT CCC 
Val Thr Thr Val Arg Pro Gly Asp Lys Val He Pro Leu Phe Thr Pro 
80 85 90 95 

CAG TGT GGA AAA TGC AGG GTT TGT AAG CAC CCT GAA GGC AAC TTC TGC 
Gin Cys Gly Lys Cys Arg Val Cys Lya His Pro Glu Gly Asn Phe Cys 
100 105 no 

TTG AAA AAT GAT CTG AGC ATG CCT CGG GGA ACC ATG CAG GAT GGT ACC 
Leu Lys Asn Asp Leu Ser Met Pro Arg Gly Thr Met Gin Asp Gly Thr 
115 120 125 

AGC AGG TTC ACC TGC AGA GGG AAG CCC ATC CAC CAC TTC CTT GGC ACC 
Ser Arg Phe Thr Cys Arg Gly Lys Pro He His His Phe Leu Gly Thr 
130 135 140 

AGC ACC TTC TCC CAG TAC ACC GTG GTG GAC GAG ATC TCA GTG GCC AAG 
Ser Thr Phe Ser Gin Tyr Thr Val Val .Asp Glu He Ser Val Ala Lys 
145 150 155 

ATC GAT GCG GCC TCA CCG CTG GAG AAA GTC TGT CTC ATT GGC TGT GGA 
He Asp Ala Ala Ser Pro Leu Glu Lys Val Cys Leu lie Gly Cys Gly 
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160 165 170 175 

TTT TCT ACT GGT TAT GGG TCT GCA GTC AAG GTT GCC AAG GTC ACC CAG 576 
Phe Ser Thr Gly Tyr Gly Ser Ala Val Lya Val Ala Lye Val Thr Gin 
5 180 185 190 

GGC TCC ACC TGT GCC GTG TTT GGC CTT GGA GCA GTG GGC CTG TCT GTT 624 
9 Gly Ser Thr Cya Ala Val Phe Gly Leu Gly Gly Val Gly Leu Ser Val 

195 200 205 

ATC ATG GGC TGT AAA GCA GCC GGA GCG GCC AGG ATC ATT GGG GTG GAC 672 
He Met Gly Cy3 Lys Ala Ala Gly Ala Ala Arg lie He Gly Val Asp 
210 215 220 

15 ATC AAC AAA GAC AAG TTT GCA AAG GCC AAA GAA GTG GGT GCC ACT GAG 720 
lie Asn Lys Asp Lys Phe Ala Lys Ala Lys Glu Val Gly Ala Thr Glu 
225 230 235 . 

TGT GTC AAC CCT CAG GAC TAC AAG AAA CCC ATC CAG GAG GTG CTG ACA 768 
20 Cys Val Asa Pro Gin Asp Tyr Lys Lys Pro He Gin Glu Val Leu Thr 
240 245 250 255 

GAA ATG AGC AAT GGA GGT GTG GAT TTT TCC TTT GAA GTC ATT GGT CGG 816 
Glu Met Ser Asn Gly Gly Val Asp Phe Ser Phe Glu Val He Gly Arg 
25 260 265 270 

CTC GAC ACT ATG GTG ACT GCC TTG TCA TGC TGT CAA GAA GCA TAT GGT 864 
Leu Asp Thr Met Val Thr Ala Leu Ser Cya Cys Gin Glu Ala Tyr Gly 
275 280 285 

30 

GTG AGC GTC ATT GTG GGA GTA CCT CCT GAT TCC CAA AAT CTC TCT ATG 912 
Val Ser Val He Val Gly Val Pro Pro Asp Ser Gin Asn Leu Ser Met 
290 295 300 

35 AAT CCT ATG TTG CTA CTG AGT GGA CGT ACC TGG AAA GGA GCT ATT TTT 960 
Asn Pro Met Leu Leu Leu Ser Gly Arg Thr Trp Lys Gly Ala He Phe 
305 310 315 

GGC GGT TTT AAG AGT AAA GAT TCT GTC CCC AAA CTT GTG GCC GAT TTT 1008 
40 Gly Gly Phe Lys Ser Lys Asp Ser Val Pro Lya Leu Val Ala Asp Phe 
320 325 330 335 

ATG GCT AAA AAG TTT GCA CTG GAT CCT TTA ATC ACC CAT GTT TTA CCT 10S6 
Met Ala Lys Lys Phe Ala Leu Asp Pro Leu He Tbr His Val Leu Pro 
45 340 345 350 

TTT GAA AAA ATA AAT GAA GGA TTT GAC CTG CTT CCC TCT GGA GAG AGT 1104 
Phe Glu Lys He Asn Glu Gly Phe Asp Leu Leu Arg Ser Gly Glu Ser 
355 360 36S 



50 



55 



75 



ATC CGT ACC ATC CTG ACG TTT TGA 1128 
lie Arg Thr He Leu Thr Phe 
370 

(2) INFORMATION FOR SEQ ID NO: 8: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 374 amino acids 
60 (B) TYPE: amino acid 

(D) TOPOLOGY: linear 

{ii> MOLECULE TYPE: protein 

65 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Ser Thr Ala Gly Lys Val He Lys Cys Lys Ala Ala Val Leu Trp Glu 
1 5 10 15 

70 Glu Lys Lys Pro Phe Ser He Glu Glu Val Glu Val Ala Pro Pro Lys 
20 25 30 



Ala His Glu Val Arg He Lys Met Val Ala Thr Gly He Cya Arg Ser 

35 40 45 

Asp Asp His Val Val Ser Gly Thr Leu Val Thr Pro Leu Pro Val He 

50 55 60 



Ala Gly His Glu Ala Ala Gly He Val Glu Asn He Gly Glu Gly Val 
80 65 70 75 80 

Thr Thr Val Arg Pro Gly Asp Lya Val He Pro Leu Phe Thr Pro Gin 
85 90 95 
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Cys Gly Lys Cys Arg Val Cys Lys His Pro Glu Gly Asn Phe Cys Leu 
100 105 . no 

Lys Asn Asp Leu Ser Met Pro Arg Gly Thr Met Gin Asp Gly Thr Ser 
J 115 120 125 

Arg Phe Thr Cys Arg Gly Lys Pro He His His Phe Leu Gly Thr Ser 
13 ° 135 140 

10 Thr Phe Ser Gin Tyr Thr Val Val Asp Glu He Ser Val Ala Lys lie 
145 ISO ass 160 

Asp Ala Ala Ser Pro Leu Glu Lys Val Cys Leu He Gly Cys Gly Phe 
j 5 165 170 175 

Ser Thr Gly Tyr Gly Ser Ala Val Lys Val Ala Lys Val Thr Gin Gly 
180 i 8 5 Y 



40 



45 



55 



190 



Ser Thr Cys Ala Val Phe Gly Leu Gly Gly Val Gly Leu Ser Val He 
195 200 205 

Met Gly Cys Lys Ala Ala Gly Ala Ala Arg He He Gly Val Asp He 
210 215 220 

25 Asn Lys Asp Lys Phe Ala Lys Ala Lys Glu Val Gly Ala Thr Glu Cys 
225 230 235 240 

Val Asn Pro Gin Asp Tyr Lys Lys Pro He Gin Glu Val Leu Thr Glu 
30 24S 250 255 

Met Ser Asn Gly Gly Val Asp Phe Ser Phe Glu Val He Gly Arc Leu 
2fi ° 265 270 

Asp Thr Met Val Thr Ala Leu Ser Cys Cys Gin Glu Ala Tyr Gly Val 
275 280 285 

Ser Val He Val Gly Val Pro Pro Asp Ser Gin Asn Leu Ser Met Aan 
290 295 300 

Pro Met Leu Leu Leu Ser Gly Arg Thr Trp Lys Gly Ala He Phe Gly 
305 310 315 

Gly Phe Lys Ser Lys Asp Ser Val Pro Lys Leu Val Ala Asp Phe Met 
325 330 335 

Ala Lys Lys Phe Ala Leu Asp Pro Leu He Thr His Val Leu Pro Phe 
340 345 350 

Glu Lys He Asn Glu Gly Phe Asp Leu Leu Arg Ser Gly Glu Ser He 
DKJ 355 «a 



80 



360 3S5 



Arg Thr He Leu Thr Phe 

370 



{2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 
<n tA) LENGTH: 1128 base pairs 

OU <B] TYPE: nucleic acid 

(C} STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

70 ™ 



75 



ATG 


AGC 
Ser 
1 


ACA 

Thr 


GCA 
Ala 


GGA 
Gly 


AAA 
Lys 
5 


GTA 
Val 


ATA 

He 


AAA 
Lys 


TGC 
Cys 


AAA 
Lys 
10 


GCG GCT GTG 
Ala Ala Val 


CTG 
Leu 


TGG 
Trp 
15 


43 


GAG 
Glu 


GAA 
Glu 


AAG 
Lys 


AAA 
Lys 


CCA 
Pro 
20 


TTT 
Phe 


TCC 
Ser 


ATC 
He 


GAG 
Glu 


GAG 
Glu 
25 


GTG 
Val 


GAG GTT GCA 
Glu Val Ala 


CCC 
Pro 
30 


CCG 
Pro 


96 


AAG 
Lys 


GCC 
Ala 


CAT 
His 


GAA 
Glu 
35 


GTC 
Val 


CGT 
Arg 


ATA 
lie 


AAG 
Lys 


ATG 
Met 
40 


GTG 
Val 


GCC ACA GGA ATT TGT CGC 
Ala Thr Gly He Cys Arg 
45 


144 


TCA GAT GAC 
Ser Asp Asp 

Eft 


CAC 
His 


GTG 
Val 


GTT 
Val 


. AGT 
Ser 


GGA 
Gly 


ACC 
Thr 


CTT 
Leu 


GTC 
Val 


ACA CCT CTT 
Thr Pro Leu 


CCT 
Pro 


GTG 
Val 


192 



SUBSTITUTE SHEET (RULE 26) 



WO 98/51802 



PCT/US98/09627 



43 

ATC GCA GGC CAT GAG GCA GCG GGC ATT GTG GAG AGC ATT GGA GAA GGC 240 
lie Ala Gly His Glu Ala Ala Gly He Val Glu Ser He Gly Glu Gly 
65 70 75 

5 GTC ACT ACA GTA AGA CCA GGT GAT AAA GTC ATC CCA CTC TTT ATT CCC 288 
val Tbr Thr Val Arg Pro Gly Asp Lys Val He Pro Leu Phe He Pro 
80 85 90 95 

CAG TGT GGA AAA TGC AGG GTT TGT AAG CAC CCT GAA GGC AAC TTC TGC 336 
10 Gin Cys Gly Lys Cys Arg Val Cys Lys His Pro Glu Gly Asn Phe Cys 
100 105 110 

TTG AAA AAT GAT CTG AGC ATG CCT CGG GGA ACC ATG CAG GAT GGT ACC 384 
Leu Lys Asn Asp Leu Ser Met Pro Arg Gly Thr Met Gin Asp Gly Thr 
15 115 120 125 



20 



40 



60 



80 



AGC AGG TTC ACC TGC AGA GGG AAG CCC ATC CAC CAC TTC CTT GGC ACC 
Ser Arg Phe Thr Cys Arg Gly Lys Pro He His His Phe Leu Gly Thr 
130 135 140 

AGC ACC TTC TCC CAG TAC ACC GTG GTG GAC GAG ATC TCA GTG GCC AAG 
Ser Thr Phe Ser Gin Tyr Thr Val Val Asp Glu He Ser Val Ala Lys 
145 ISO 155 



25 ATC GAT GCG GCC TCA CCG CTG GAG AAA GTC TGT CTC ATT GGC TGT GGA 528 

He Asp Ala Ala Ser Pro Leu Glu Lys Val Cys Leu He Gly Cys Gly 
160 165 170 175 

TTT TCT ACT GGT TAT GGG TCT GCA GTC AAG GTT GCC AAG GTC ACC CAG 576 

30 Phe Ser Thr Gly Tyr Gly Ser Ala Val Lys Val Ala Lys Val Thr Gin 

180 185 190 

GGC TCC ACC TGT GCC GTG TTT GGC CTT GGA GGA GTG GGC CTG TCT GTT 624 

Gly Ser Thr Cys Ala Val Phe Gly Leu Gly Gly Val Gly Leu Ser Val 
35 195 200 205 



ATC ATG GGC TGT AAA GCA GCC GGA GCG GCC AGG ATC ATT GGG GTG GAC 672 
He Met Gly Cys Lys Ala Ala Gly Ala Ala Arg He He Gly Val Asp 
210 215 220 

ATC AAC AAA GAC AAG TTT GCA AAG GCC AAA GAA GTG GGT GCC ACT GAG 720 
He Asn Lys Asp Lys Phe Ala Lys Ala Lys Glu Val Gly Ala Thr Glu 
225 230 235 

45 TGT GTC AAC CCT CAG GAC TAC AAG AAA CCC ATC CAG GAG GTG CTG ACA 768 
Cys Val Asn Pro Gin Asp Tyr Lys Lya Pro He Gin Glu Val Leu Thr 
240 245 250 255 

GAA ATG AGC AAT GGA GGT GTG GAT TTT TCC TTT GAA GTC ATT GGT CGG 816 
50 Glu Met Ser Asn Gly Gly Val Asp Phe Ser Phe Glu Val He Gly Arg 
260 265 270 

CTC GAC ACT ATG GTG ACT GCC TTG TCA TGC TGT CAA GAA GCA TAT GGT 864 
Leu Asp Thr Met Val Thr Ala Leu Ser Cys Cys Gin Glu Ala Tyr Gly 
55 275 280 285 

GTG AGC GTC ATT GTG GGA GTA CCT CCT GAT TCC CAA AAT CTC TCT ATG 912 
Val Ser Val He Val Gly Val Pro Pro Asp Ser Gin Asn Leu Ser Met 
290 29S 300 

AAT CCT ATG TTG CTA CTG AGT GGA CGT ACC TGG AAA GGA GCT ATT TTT 960 
Asn Pro Met Leu Leu Leu Ser Gly Arg Thr Tip Lys Gly Ala He Phe 
305 310 315 

65 GGC GGT TTT AAG AGT AAA GAT TCT GTC CCC AAA CTT GTG GCC GAT TTT 1008 
Gly Gly Phe Lys Ser Lys Asp Ser Val Pro Lys Leu Val Ala Asp Phe 
320 325 330 335 

ATG GCT AAA AAG TTT GCA CTG GAT CCT TTA ATC ACC CAT GTT TTA CCT 10S6 
70 Met Ala Lys Lys Phe Ala Leu Asp Pro Leu He Thr His Val Leu Pro 
340 345 350 

TTT GAA AAA ATA AAT GAA GGA TTT GAC CTG CTT CGC TCT GGA GAG AGT 1104 
Phe Glu Lys He Asn Glu Gly Phe Asp Leu Leu Arg Ser Gly Glu Ser 
75 355 360 365 



ATC CGT ACC ATC CTG ACG TTT TGA 
He Arg Thr He Leu Thr Phe 
370 



(2) INFORMATION FOR SEQ ID NO: 10: 

<i) SEQUENCE CHARACTERISTICS: 
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A LENGTH: 374 amino acids 
(B) TYPE: amino acid 
(D) TOPOLOGY: linear 



10 



(ii) MOLECULE TYPE; protein 
(xi> SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
Ser Thr Ala Gly hy * VaI Ile hyQ ^ ^ ^ ^ ^ ^ ^ ^ 

Glu Lys Lys Pro Phe Ser lie Glu Glu Val ciu v.l Ala Pro ero Lys 
15 Ala His Glu Val Arg U e Lys Met Val Ala Thr Gly Ho Cys Arg Ser 



15 

F 

30 

C 

Asp Asp His val val Ser Gly T hr Leu Val Thr ^ prQ ^ 

20 55 eo 

Ala Gly His ciu Ala ^ Wjr rle „ al ^ ^ ^ ^ . ^ ^ ^ 

25 Thr Thr val Arg Pre Gly Asp tys Val „, pro ^ phe ^ ^ J 

90 gs 

Cys Gly Lys Cys Arg Val Cys Lya „i 8 Pro Glu Gly Asn phe ^ 

no 

Lys Asn Asp Leu Ser Met Pro Arg Gly Thr Met Gin Asp Gly Thr Ser 

120 125 
Arg Phe Thr Cys Arg Gly Ly 8 Pro Ue His His Phe Leu Gly Thr Ser 

JS 140 
Thr Phe ser Gin Tyr Thr Val Val Asp Glu lie Ser Val Ala Lys Ue 

155 160 

40 Asp Ala Ala ser Pro Leu Glu Lya Val Cys Leu Ue Gly Cys Gly Phe 

Ser Thr Gly Tyr Gly Ser Ala Val Lys Val Ala Lys Val Thr CXn Cly 

185 lgo 

45 ser Thr Cys Ala Val Phe Gly Leu Gly Gly Va! Gly Leu Ser Val lie 

200 205 

Met Gly cys Lys Ala Ala Gly Ala Ala Arg lie lie Gly Val Asp lie 



50 215 220 

Asn Lys Asp Lya Phe Ala Lys Ala Lys Glu Val Gly Ala Thr Glu Cys 

235 240 

55 Val Asn Pro Gin Asp Tyr Lya Lys Pro lie Ola Glu Val Leu Thr Glu 

2S0 255 

Met ser Asn Gly Gly Val Aap Phe Ser Phe Glu Val Ue Gly Arg Leu 

265 270 

60 Asp Thr Met Val Thr Ala Leu Ser Cys Cys Gin Glu Ala Tyr Gly Val 

280 285 

ser Val lie Val Gly Val Pro Pro Asp ser Gin Asn Leu Ser Met Asn 

65 2 95 300 

Pro Met Leu Leu Leu ser Gly Arg Thr Trp Lys Gly Ala lie Phe Gly 

315 320 

?0 Gly Phe Lys Ser Lys Asp Ser Val Pro Lya Leu Val Ala Asp Phe Met 

330 335 
Ala Lys Lys Phe Ala Leu Asp Pro Leu lie Thr His Val Leu Pro Phe 

n Glu Lys U. Asn Glu Gly Phe Asp Leu Leu Arg Ser Gly Glu Ser Ue 

360 365 

Arg Thr He Leu Thr Phe 
370 



80 



(2) INFORMATION FOR SEQ ID NO; 11: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 1128 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 
(DJ TOPOLOGY: linear 



10 



50 



(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

ATG AGC ACA GCA GGA AAA GTA ATA AAA TGC AAA GCG GCT GTG CTG TGG 
Ser Thr Ala Gly Lys Val He Lys Cys Lys Ala Ala Val Leu Trp 
IS 10 IS 



15 GAG GAA AAG AAA CCA TTT TCC ATC GAG GAG GTG GAG GTT GCA CCC CCG 96 
Glu Glu Lys Lys Pro Phe Ser He Glu Glu Val Glu Val Ala Pro Pro 
20 25 30 

AAG GCC CAT GAA GTC CGT ATA AAG ATG GTG GCC ACA GGA ATT TGT CGC 144 
20 Lys Ala His Glu Val Arg He Lys Met Val Ala Thr Gly He Cya Arg 
35 40 45 

TCA GAT GAC CAC GTG GTT AGT GGA ACC CTT GTC ACA CCT CTT CCT GTG 192 
Ser Asp Asp His Val Val Ser Gly Thr Leu Val Thr Pro Leu Pro Val 
25 50 55 60 

ATC GCA GGC CAT GAG GCA GCG GGC ATT GTG GAG AGC ATT GGA GAA GGC 240 

He Ala Gly His Glu Ala Ala Gly He Val Glu Ser He Gly Glu Gly 
65 70 75 

30 

GTC ACT ACA GTA AGA CCA GGT GAT AAA GTC ATC CCA CTC TTT ACT CCC 288 

Val Thr Thr Val Arg Pro Gly Asp Lys Val He Pro Leu Phe Thr Pro 
80 85 90 95 

35 CAG TGT GGA AAA TGC AGG GTT TGT AAG CAC CCT GAA GGC AAC TTC TGC 336 
Gin Cys Gly Lys Cys Arg Val Cys Lys His Pro Glu Gly Asn Phe Cys 
100 105 110 

TTG AAA AAT GAT CTG AGC ATG CCT CGG GGA ACC ATG CAG GAT GGT ACC 384 
40 Leu Lys Asn Asp Leu Ser Met Pro Arg Gly Thr Met Gin Asp Gly Thr 
115 120 125 

AGC AGG TTC ACC TGC AGA GGG AAG CCC ATC CAC CAC TTC CTT GGC ACC 432 
Ser Arg Phe Thr Cys Arg Gly Lys Pro He His His Phe Leu Gly Thr 
45 130 135 140 

AGC ACC TTC TCC CAG TAC ACC GTG GTG GAC GAG ATC TCA GTG GCC AAG 480 
Ser Thr Phe Ser Gin Tyr Thr Val Val Asp Glu He Ser Val Ala Lys 
145 ISO 155 

ATC GAT GCG GCC TCA CCG CTG GAG AAA GTC TGT CTC ATT GGC TGT GGA 528 
He Asp Ala Ala Ser Pro Leu Glu Lys Val Cys Leu He Gly Cys Gly 
160 165 170 175 

55 TTT TCT ACT GGT TAT GGG TCT GCA GTC AAG GTT GCC AAG GTC ACC CAG 576 
Phe Ser Thr Gly Tyr Gly Ser Ala Val Lys Val Ala Lys Val Thr Gin 
180 185 190 

GGC TCC ACC TGT GCC GTG TTT GGC CTT GGA GGA GTG GGC CTG TCT GTT 624 
60 Gly Ser Thr Cys Ala Val Phe Gly Leu Gly Gly Val Gly Leu Ser Val 
195 200 205 

ATC ATG GGC TGT AAA GCA GCC GGA GCG GCC AGG ATC ATT GGG GTG GAC 672 
He Met Gly Cys Lys Ala Ala Gly Ala Ala Arg He He Gly Val Asp 
65 210 215 220 

ATC AAC AAA GAC AAG TTT GCA AAG GCC AAA GAA GTG GGT GCC ACT GAG 720 

He Asn Lys Asp Lys Phe Ala Lys Ala Lys Glu Val Gly Ala Thr Glu 
225 230 235 

70 

TGT GTC AAC CCT CAG GAC TAC AAG AAA CCC ATC CAG GAG GTG CTG ACA 768 

Cys Val Asn Pro Gin Asp Tyr Lys Lys Pro He Gin Glu Val Leu Thr 
240 245 250 255 

75 GAA ATG AGC AAT GGA GGT GTG GAT TTT TCC TTT GAA GTC ATT GGT CGG 816 
Glu Met Ser Asn Gly Gly Val Asp Phe Ser Phe Glu Val He Gly Arg 
260 265 270 

CTC GAC ACT ATG GTG ACT GCC TTG TCA TGC AGT CAA GAA GCA TAT GGT 864 
80 Leu Asp Thr Met Val Thr Ala Leu Ser Cya Ser Gin Glu Ala Tyr Gly 
275 280 285 

GTG AGC GTC ATT GTG GGA GTA CCT CCT GGT TCC CAA AAT CTC TCT ATG 912 
Val Ser Val He Val Gly Val Pro Pro Gly Ser Gin Asn Leu Ser Met 
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290 295 



300 



£n X? ^ P A m AGT «» TGG AAA GGA GCT ATT TTT 

5 Asn Pro Met Leu Leu Leu Ser Gly Arg Thr Trp Lys 01y S ?S 



315 



15 



350 

360 365 



2j (2 > INFORMATION FOR SEQ ID NO: 12: 
30 



50 



65 



(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 374 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY; linear 



(ii) MOLECULE TYPE: protein 
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
35 Ser Thr Ala Gly Lys Val He Lys Cys Lys Ala Ala Val Leu Trp Glu 



10 15 



Glu Lys Lys Pro Phe Ser He Glu Glu Val Glu Val Ala Pro Pro Lys 

40 25 3o 

Ala His Glu val Arg He Lys Met Val Ala Thr Gly lie Cys Arg Ser 

*0 45 
45 Asp Asp His Val Val Ser Gly Thr Leu Val Thr Pro Leu Pro Val He 



60 



Ala Gly His Glu Ala Ala Gly He Val Glu Ser lie Gly Glu Gly Val 
70 75 go 

Thr Thr Val Arg Pro Gly Asp Lys Val II, Pro Leu Phe Thr Pro Gin 

85 90 9S 

55 <** G1V LyS Hi Ar 9 Val C >"» »*■ Hi- Pro Glu Gly A 3 „ Phe Cys Leu 

Lys Asn Asp Leu Ser Met Pro Arg Gly Thr Met Gin Asp Gly Thr Ser 

120 12S 

60 ^ 5S ^ *** Gly %Z Pr ° IIe Hi3 His p »>e l-u Gly Thr Ser 

140 

Thr Phe ser Gin Tyr Thr Val Val Asp Glu lie Ser Val Ala Lys He 



Asp Ala Ala Ser Pro Leu Glu Lys Val Cys Leu He Gly Cys Gly Phe 
Ser Thr Gly Tyr Gly Ser Ala Val Lys Val Ala Lys Val Thr ela Gly 

70 185 190 

ser Thr Cys Ala Val Phe aiy Leu Gly Gly Val Gly Leu Ser Val lie 

200 205 

75 M " ^ Wa Ala ill Ala Ala *** Ile »• «V val Asp He 

Asn Lys Asp Ly 3 Phe Ala Lys Ala Lys Glu Val Gly Ala Thr Glu Cys 
230 235 240 

80 val Asn Pro Gin Asp Tyr L ys Lys Pro lie Gin Glu Val Leu Thr Glu 
245 250 255 

Met Ser Asn Gly Gly Val Asp Phe Ser Phe Glu Val lie Gly Arg Leu 

265 270 



960 



325 330 3 3 5 

s s ss S E .S S 5 s a s S S SIT SI s - 

0 345 



1104 



ATC CGT ACC ATC CTG ACG TTT TGA 
ZU He Arg Thr He Leu Thr Phe 1128 
370 
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Asp Tiir Met Val Thr Ala Leu Ser Cys Ser Gin Glu Ala Tyr Gly Val 
275 280 285 

Ser Val lie Val Gly Val Pro Pro Gly Ser Gin Asn Leu Ser Met Asn 
290 295 300 

Pro Met Leu Leu Leu Ser Gly Arg Thr Trp Lys Gly Ala He Phe Gly 
305 310 315 320 

Gly Phe Lys Ser Lys Asp Ser Val Pro Lys Leu Val Ala Asp Phe Met 
325 330 335 

Ala Lys Lys Phe Ala Leu Asp Pro Leu He Thr His Val Leu Pro Phe 
340 345 350 

Glu Lys He Asn Glu Gly Phe Asp Leu Leu Arg Ser Gly Glu Ser He 
3S5 360 365 

Arg Thr He Leu Thr Phe 
370 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1128 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

ATG AGC ACA GCA GGA AAA GTA ATA AAA TGC AAA GCG GCT GTG CTG TGG 
Ser Thr Ala Gly Lys Val He Lys Cys Lys Ala Ala Val Leu Trp 
1 5 10 is 

GAG GAA AAG AAA CCA TTT TCC ATC GAG GAG GTG GAG GTT GCA CCC CCG 
Glu Glu Lys Lys Pro Phe Ser lie Glu Glu Val Glu Val Ala Pro Pro 
20 25 30 

AAG GCC CAT GAA GTC CGT ATA AAG ATG GTG GCC ACA GGA ATT TGT CGC 
Lys Ala His Glu Val Arg He Lys Met Val Ala Thr Gly He Cys Arg 
35 40 45 

TCA GAT GAC CAC GTG GTT AGT GGA ACC CTT GTC ACA CCT CTT CCT GTG 
Ser Asp Asp His Val Val Ser Gly Thr Leu Val Thr Pro Leu Pro Val 
50 55 60 

ATC GCA GGC CAT GAG GCA GCG GGC ATT GTG GAG AGC ATT GGA GAA GGC 
He Ala Gly His Glu Ala Ala Gly He Val Glu Ser He Gly Glu Gly 
65 70 75 

GTC ACT ACA GTA AGA CCA GGT GAT AAA GTC ATC CCA CTC TTT ACT CCC 
Val Thr Thr Val Arg Pro Gly Asp Lys Val He Pro Leu Phe Thr Pro 
80 85 90 95 

CAG TGT GGA AAA TGC AGG GTT TGT AAG CAC CCT GAA GGC AAC TTC TGC 
Gin Cys Gly Lys Cys Arg Val Cys Lys His Pro Glu Gly Asn Phe Cys 
100 105 110 

TTG AAA AAT GAT CTG AGC ATG CCT CGG GGA ACC ATG CAG GAT GGT ACC 
Leu Lys Asn Asp Leu Ser Met Pro Arg Gly Thr Met Gin Asp Gly Thr 
115 120 125 

AGC AGG TTC ACC TGC AGA GGG AAG CCC ATC CAC CAC TTC CTT GGC ACC 
Ser Arg Phe Thr Cys Arg Gly Lys Pro He His His Phe Leu Gly Thr 
130 135 140 

AGC ACC TTC TCC CAG TAC ACC GTG GTG GAC GAG ATC TCA GTG GCC AAG 
Ser Thr Phe Ser Gin Tyr Thr Val Val Asp Glu He Ser Val Ala Lys 
145 150 155 

ATC GAT GCG GCC TCA CCG CTG GAG AAA GTC TGT CTC ATT GGC TGT GGA 
He Asp Ala Ala Ser Pro Leu Glu Lys Val Cys Leu He Gly Cys Gly 
160 165 170 175 

TTT TCT ACT GGT TAT GGG TCT GCA GTC AAG GTT GCC AAG GTC ACC CAG 
Phe Ser Thr Gly Tyr Gly Ser Ala Val Lys Val Ala Lys Val Thr Gin 
180 185 190 
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GGC TCC ACC TGT GCC GTG TTT GGC CTT GGA GGA GTG GGC CTG TCT GTT 
Gly Ser Thr Cys Ala Val Phe Gly Leu Gly Gly Val Gly Leu Ser Val 
195 200 205 

ATC ATG GGC TGT AAA GCA GCC GGA GCG GCC AGG ATC ATT GGG GTG GAC 
lie Met Gly Cys Lys Ala Ala Gly Ala Ala Arg lie He Gly Val Asp 
210 215 220 

ATC AAC AAA GAC AAG TTT GCA AAG GCC AAA GAA GTG GGT GCC ACT GAG 
He Aan Lya Asp Lys Phe Ala Lys Ala Lys Glu Val Gly Ala Thr Glu 
225 230 235 

TGT GTC AAC CCT CAG GAC TAC AAG AAA CCC ATC CAG GAG GTG CTG ACA 
Cys Val Asn Pro Gin Aap Tyr Lys Lys Pro He Gin Glu Val Leu Thr 
240 245 250 255 

GAA ATA AGC AAT GGA GGT GTG GAT TTT TCC TTT GAA GTC ATT GGT CGG 
Glu GCG Ser Asn Gly Gly Val Asp Phe Ser Phe Glu Val He Gly Arg 
260 265 270 

CTC GAC ACT ATG GTG ACT GCC TTG TCA TGC TGT CAA GAA GCA TAT GGT 
Leu Asp Thr Met Val Thr Ala Leu Ser Cys Cys Gin Glu Ala Tyr Gly 
275 280 285 

GTG AGC GTC ATT GCG GGA GTA CCT CCT GAT TCC CAA AAT CTC TCT ATG 
Val Ser Val He Ala Gly Val Pro Pro Asp Ser Gin Asn Leu Ser Met 
290 295 300 

AAT CCT ATG TTG CTA CTG AGT GGA CGT ACC TGG AAA GGA GCT ATT TTT 
Asn Pro Met Leu Leu Leu Ser Gly Arg Thr Trp Lys Gly Ala He Phe 
305 310 315 

GGC GGT TTT AAG AGT AAA GAT TCT GTC CCC AAA CTT GTG GCC GAT TTT 
Gly Gly Phe Lys Ser Lys Asp Ser Val Pro Lys Leu Val Ala Asp Phe 
320 325 330 335 

ATG GCT AAA AAG TTT GCA CTG GAT CCT TTA ATC ACC CAT GTT TTA CCT 
Met Ala Lys Lys Phe Ala Leu Asp Pro Leu He Thr His Val Leu Pro 
340 34S 350 

TTT GAA AAA ATA AAT GAA GGA TTT GAC CTG CTT CGC TCT GGA GAG AGT 
Phe Glu Lys He Asn Glu Gly Phe Asp Leu Leu Arg Ser Gly Glu Ser 
355 360 365 

ATC CGT ACC ATC CTG ACG TTT TGA 
He Arg Thr He Leu Thr Phe 
370 



(2) INFORMATION FOR SEQ ID NO:14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 374 amino acids 

(B) TYPE: amino acid 
(O) TOPOLOGY : linear 

(ii) MOLECULE TYPE : protein 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Ser Thr Ala Gly Lys Val He Lys Cys Lys Ala Ala Val Leu Trp Glu 
1 5 10 15 

Glu Lys Lys Pro Phe Ser He Glu Glu Val Glu Val Ala Pro Pro Lys 
20 25 30 

Ala His Glu Val Arg He Lys Met Val Ala Thr Gly lie Cys Arg Ser 
35 40 45 

Asp Abp His val Val Ser Gly Thr Leu val Thr Pro Leu Pro Val He 
50 55 60 

Ala Gly His Glu Ala Ala Gly He Val Glu Ser He Gly Glu Gly Val 
65 70 75 80 

Thr Thr Val Arg Pro Gly Asp LyB Val He Pro Leu Phe Thr Pro Gin 
85 90 95 

Cys Gly LyB Cys Arg Val Cys Lys His Pro Glu Gly Asn Phe Cys Leu 
100 10S . 110 

Lya Asn Asp Leu Ser Met Pro Arg Gly Thr Met Gin Asp Gly Thr Ser 
115 120 125 
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15 



30 



45 



49 

Arg Phe Thr Cys Arg Gly Lys Pro He His HiB Phe Leu Gly Thr Ser 
130 135 140 

Thr Phe Ser Gin Tyr Thr Val Val Asp Glu lie Ser Val Ala Lys lie 
150 155 160 



60 



145 



A 3 r> Ala Ala Ser Pro Leu Glu Lys Val Cys Leu He Gly Cys Gly Phe 
* 165 170 1"5 

Ser Thr Gly Tyr Gly Ser Ala Val Lys Val Ala Lys Val Thr Gin Gly 
180 l fiS 190 

ser Thr Cys Ala val Phe Gly Leu Gly Gly val Gly Leu Ser Val He 
195 200 205 

Met Gly Cys Lys Ala Ala Gly Ala Ala Arg He He Gly Val Asp He 

210 215 220 

Asn Lys Asp Lys Phe Ala Lys Ala Lys Glu Val Gly Ala Thr Glu Cys 
20 225 23° 235 

Val Asn Pro Gin Asp Tyr Lys Lys Pro He Gin Glu Val Leu Thr Glu 
245 250 z " 

25 He Ser Asn Gly Gly Val Asp Phe Ser Phe Glu Val He Gly Arg Leu 



260 



Asp Thr Met Val Thr Ala Leu Ser Cys Cys Gin Glu Ala Tyr Gly Val 

275 200 
Ser Val He Ala Gly Val Pro Pro Asp Ser Gin Asn Leu Ser Met Asn 

290 295 300 

Pro Met Leu Leu Leu Ser Gly Arg Thr Trp Lys Gly Ala He Phe Gly 
35 305 310 315 

Glv Phe Lys Ser Lys Asp Ser Val Pro Lys Leu Val Ala Asp Phe Met 
1 325 330 335 

40 Ala Lys Lys Phe Ala Leu Asp Pro Leu He Thr His Val Leu Pro Phe 



340 



Glu Lys He Asn Glu Gly Phe Asp Leu Leu Arg Ser Gly Glu Ser He 
355 360 3" 

Arg Thr He Leu Thr Phe 
370 



50 (2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 1128 base pairs 

(B) TYPE: nucleic acid 
55 (c) STRAND EDNESS : double 

(D) TOPOLOGY : linear 



(ii) MOLECULE . TYPE : DNA (genomic) 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 



3S 



75 



ATC GCA GGC CAT Wto o*-v* wul rw* ~— " , 

80 He Ala Gly His Glu Ala Ala Gly He Val Glu Ser He Gly Glu Gly 

70 75 



65 



46 



96 



, Tr AGC AC A GCA GGA AAA GTA ATA AAA TGC AAA GCG GCT GTG CTG TGG 
ATG ser T^ Ala Gly Lys Val lie Lys Cys Lys Ala Ala Val Leu Trp 
65 1 5 10 

TAG GAA AAG AAA CCA TTT TCC ATC GAG GAG GTG GAG GTT GCA CCC CCG 
Glu Glu Ly s Lys Pro Phe Ser lie Glu Glu Val Glu Val Ala Pro Pro 
20 25 JU 

70 AAG GCC CAT GAA GTC CGT ATA AAG ATG GTG GCC ACA GGA ATT TGT CGC 144 
£ya Ala His Glu Val Arg He Lys Met Val Ala Thr Gly lie Cys Arg 



TCA GAT GAC CAC GTG GTT AGT GGA ACC CTT GTC ACA CCT CTT CCT GTG 192 
llr Asp Asp His Val Val Ser Gly Thr Leu Val Thr Pro Leu Pro Val 
50 55 60 

ATC GCA OOC CAT GAG GCA GOT GGC ATT GTG GAG AGC ATT GGA GAA GGC 240 



GTC ACT ACA GTA AGA CCA GGT GAT AAA GTC ATC CCA CTC TTT ACT CCC 208 
val Thr Tta Val Arg Pro Gly Asp Lys val lie fro Leu Phe Thr Pro 
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50 



80 



85 



90 



95 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



70 



75 



80 



G?n Ttl 5? A TGC AGG GTT TGT AAG CAC CCT GAA GGC AAC TTC TGC 

Gin Cys Gly L y 8 Cys Arg Val Cy S Lys His Pro Glu oiy Asn tie Cys 

™ 2£ »TC CCT CGG GGA AX ATG CAG GAT GOT ACC 

Leu Lys Asn Asp Leu Ser Met Pro Ar9 Gly Thr Met Gin Asp Gly Thr 
115 120 12 ? y 

AGC AGG TTC ACC TGC AGA GGG AAG CCC ATC CAC CAC TTC CTT GGC ACT 
Ser Arg Phe Thr Cya Arg Gly Lys Pro lie Hie Ss III £u rSr 

135 140 

AGC ACC TTC TCC CAG TAC ACC GTG GTG GAC GAG ATC TCA GTG Gor aar 
Ser Thr Phe Ser Gin Tyr Thr Val Val A S p ™ lie SS 52 Ma £s 
150 1 

ATC GAT GCG GCC TCA CCG CTG GAG AAA GTC TGT CTC ATT GGC TGT rri 
lie Asp Ala Ala Ser Pro Leu Glu Lys Val ™ Zl J™ G^y ™ gg 
5 170 

EESSSSEBSKEESSSa 

180 185 190 

GGC TCC ACC TGT GCC GTG TTT GGC CTT GGA GGA GTG GGC CTG TCT CTT 
Gly Ser Thr Cya Ala Val Phe Gly Leu Gly Gly Sal tty Zu S Sal 

200 205 

s = s s: « s s ssjesssss s 

215 220 

ill f AC ^ m GCA AAG GCC AAA GAA GTG GGT GCC ACT GAG 

lie Asn Ly S Asp Lys Phe Ala Lys Ala Lys Glu Val Gly 111 Thr G?S 
230 235 

SI 52 Aan p£ atn ^ C ™ C CCC ATC GAG GTG CTG ACA 

Cys Val Asn Pro Gin Asp Tyr Lys Lys Pro He Gin Glu Val Leu Thr 
W 245 250 255 

GAA ATG AGC AAT GGA GGT GTG GAT TTT TCC TTT GAA GTC ATT GGT CCG 
Glu Met Ser Asn Gly Gly Val Asp Phe Ser Phe X£ Val III Arg 
260 265 270 

CTC GAC ACT ATG GTG ACT GCC TTG TCA TGC TGT CAA GAA GCA TAT GGT 
Leu Asp Thr Met Val Thr Ala Leu Ser Cys Cys Gin Glu aS ™ £J 
5 280 285 

SI? ser vTl Ue SI? g?v f* GAT T ° C ** T CTC TCT A ™ 
vai ser val lie Val Gly Val Pro Pro Asp Ser Gin Asn Leu Ser Met 

290 295 300 

™l FT A I G ™ ™ fTG ACT GGA CGT ACC TGG AAA GGA GCT ATT TTT 
Asn Pro Met Leu Leu Leu Ser Gly Arg Thr Trp Lys Gly Ala lie Phe 
Ju;> 310 3i5 

Glv 115 III ££5 AGT i** GAT TCT GTC CCC AAA CTT GTG GCC GAT TTT 
Gly Gly Phe Lye Ser Lys Asp Ser Val Pro Lys Leu Val Ala Asp p£ 
325 330 33 S 

ATG GCT AAA AAG TTT GCA CTG GAT CCT TTA ATC ACC CAT GTT TTA CCT 
Met Ala Lys Lys Phe Ala Leu Asp Pro Leu lie Thr Sis SIT EE £1 
340 345 350 

Phe Glu Lvs n» ^ G ? A TTT GAC CTG CTT CGC TCT GGA GAG AGT 

Phe Glu Lys lie Asn Glu Gly Phe Asp Leu Leu Arg Ser Gly Glu Ser 
J55 360 365 

ATC CGT ACC ATC CTG ACG TTT TGA 
He Arg Thr lie Leu Thr Phe 
370 



(2) INFORMATION FOR SEQ ID NO: 16: 

U) SEQUENCE CHARACTERISTICS: 

(AJ LENGTH: 374 amino acids 
(B) TYPE: amino acid 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : protein 



336 



432 



528 



576 



624 



768 



816 



912 



960 



1008 



1056 



1104 



1128 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

Ser Thr Ala Gly Lys Val lie Lys Cys Lys Ala Ala Val Leu Trp Glu 
1 5 10 15 

^ Glu Lys Lys Pro Phe Ser He Glu Glu Val Glu Val Ala Pro Pro Lys 
20 25 30 

Ala His Glu Val Arg He Lys Met Val Ala Thr Gly He Cys Arg Ser 
10 35 40 45 

Asp Asp His Val Val Ser Gly Thr Leu Val Thr Pro Leu Pro Val He 
50 55 60 

15 Ala Gly His Glu Ala Ala Gly He Val Glu Ser He Gly Glu Gly Val 
65 70 75 80 

Thr Thr Val Arg Pro Gly Asp Lys Val He Pro Leu Phe Thr Pro Gin 
85 90 95 

Cvs Gly Lys Cys Arg Val Cys Lys His Pro Glu Gly Asn Phe Cys Leu 
3 100 105 HO 

Lvs Asn Asp Leu Ser Met Pro Arg Gly Thr Met Gin Asp Gly Thr Ser 
25 115 120 125 

ArQ Phe Thr Cys Arg Gly Lys Pro He His His Phe Leu Gly Thr Ser 
130 135 140 

30 Thr Phe Ser Gin Tyr Thr Val Val Asp Glu He Ser Val Ala Lys He 
145 150 155 160 



20 



35 



50 



65 



Asp Ala Ala Ser Pro Leu Glu Lys Val Cys Leu He Gly Cys Gly Phe 
165 170 175 

Thr Thr Gly Tyr Gly Ser Ala Val Lys Val Ala Lys Val Thr Gin Gly 
ISO 1C(; 190 



Ser Thr Cys Ala Val Phe Gly Leu Gly Gly Val Gly Leu Ser Val He 
40 195 200 205 

Met Gly Cys Lys Ala Ala Gly Ala Ala Arg He He Gly val Asp He 
210 215 220 

45 Asn Lys Asp Lys Phe Ala Lys Ala Lys Glu Val Gly Ala Thr Glu Cys 
225 23° 235 

Val Asn Pro Gin Asp Tyr Lys Lys Pro He Gin Glu Val Leu Thr Glu 
245 250 255 

Met Ser Asn Gly Gly Val Asp Phe Ser Phe Glu Val He Gly Arg Leu 
265 270 



260 



Asp Thr Met Val Thr Ala Leu Ser Cys Cys Gin Glu Ala Tyr Gly Val 
55 275 280 285 

Ser Val He Val Gly Val Pro Pro Asp Ser Gin Asn Leu Ser Met Asn 
290 295 300 

60 Pro Met Leu Leu Leu Ser Gly Arg Thr Trp Lys Gly Ala He Phe Gly 
305 310 315 320 

Gly Phe Lys Ser Lys Asp Ser Val Pro Lys Leu Val Ala Asp Phe Met 
325 330 335 

Ala Lys Lys Phe Ala Leu Asp Pro Leu He Thr His Val Leu Pro Phe 
340 345 350 

Glu Lys He Asn Glu Gly Phe Asp Leu Leu Arg Ser Gly Glu Ser He 
70 355 360 365 



75 



Arg Thr He Leu Thr Phe 
370 



(2) INFORMATION FOR SEQ ID NO: 17; 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1128 base pairs 
80 {B» TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

tii) MOLECULE TYPE: DNA (genomic) 
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20 



35 
40 
45 



60 
65 
70 



80 



<xi) SEQUENCE DESCRIPTION SEQ ID NO: 17: 

ATG AGC ACA GCA GGA AAA GTA ATA AAA TGC AAA GCG GCT GTG CTG TGG 
Ser. Thr Ala Gly Lys Val lie Lys Cys Lys Ala Ala Val Leu Trp 
1 5 io ^ 



15 



25 GTC ACT ACA GTA AGA CCA GGT GAT AAA GTC ATC CCA CTC TTT ACT CCC 
val Thr Thr Val Arg Pro Gly Asp Lys Val lie Pre, Leu Pie Thr Pro 
80 65 9 0 gs 

7f» I GT S? A ^ 7X30 AGG GTT TCT ^ CCT GAA GGC AAC CTC TGC 

Gla Cys Gly Lys Cys Arg Val Cys Lys His Pro Glu Gly Asn Leu Cys 
100 105 Ho 

TTG AAA AAT GAT CTG AGC ATG CCT CGG GGA ACC ATG CAG GAT GGT ACC 
Leu Lys Asn Asp Leu Ser Met Pro Arg Gly Thr Met Gin Asp Gly Thr 
115 120 125 

AGC AGG TTC ACC TGC AGA GGG AAG CCC ATC CAC CAC TTC CTT GGC ACC 
Ser Arg Phe Thr Cys Arg Gly Lys Pro lie His His Phe Leu Gly Thr 



140 

AGC TTC TCC CAG TAC ACC GTG GTG GAC GAG ATC TCA GTG GCC AAG 

Ser Thr Phe Ser Gla Tyr Thr Val Val Asp Glu He Ser Val Ala Lys 
145 150 155 * 

ATC GAT GCG GCC TCA CCG CTG GAG AAA GTC TGT CTC ATT GGC TGT GGA 
lie Asp Ala Ala Ser Pro Leu Glu Lys Val Cys Leu lie Gly Cys Gly 
160 165 17 q 



175 



TTT TCT ACT GGT TAT GGG TCT GCA GTC AAG GTT GCC AAG GTC ACC CAG 
50 Phe Ser Thr Gly Tyr Gly Ser Ala Val Lys Val aU Lys Val T*hr 

180 ies iso 

GGC TCC ACC TGT GCC GTG TTT GGC CTT GGA GGA GTG GGC CTG TCT GTT 
Gly Ser Thr Cys Ala Val Phe Gly Leu Gly Gly Val Gly Leu Ser Val 
DD 195 200 205 

ATC ATG GGC TGT AAA GCA GCC GGA GCG GCC AGG ATC ATT GGG GTG GAC 
lie Met Gly Cys Lys Ala Ala Gly Ala Ala Arg tie lie Gly Val Asp 

215 220 

ATC AAC AAA GAC AAG TTT GCA AAG GCC AAA GAA GTG GGT GCC ACT GAG 
He Asn Lys Asp Lys Phe Ala Lys Ala Lys Glu Val Gly Ala Thr Glu 
225 230 235 

JF* S T ? J* 0 ^ GAC TAC CCC A TC CAG GAG GTG CTG ACA 

Cys Val Asn Pro Gin Asp Tyr Lys Lys Pro He Gin Glu Val Leu Thr 

240 24 5 250 255 

GAA ATG AGC AAT GGA GGT GTG GAT TTT TCC TTT GAA GTC ATT GGT CGG 
Glu Met ser Asn Gly Gly Val Asp Phe Ser Phe Glu Val lie Gly Ara 
260 2€S — M 



270 



CTC GAC ACT ATG GTG ACT GCC TTG TCA TGC TGT CAA GAA GCA TAT GGT 
Leu Asp Thr Met Val Thr Ala Leu Ser Cys Cys Gin Glu Ala lyr G^y 
275 280 28S 

GTG AGC GTC ATT GTG GGA GTA CCT CCT GAT TCC CAA AAT CTC TCT ATG 
Val Ser Val He Val Gly Val Pro Pro Asp Ser Gin Leu l?r 
290 295 300 



AAT CCT ATG TTG CTA CTG AGT GGA CGT ACC TGG AAA GGA GCT ATT TTT 
Asn Pro Met Leu Leu Leu Ser Gly Arg Thr Trp Lys Gly Ala He Phe 
305 310 315 



48 



GAG GAA AAG AAA CCA TTT TCC ATC GAG GAG GTG GAG GTT GCA CCC CCG 96 
10 Glu Glu Lys Lys Pro Phe ser lie Glu Glu Val Glu Val Ala Pro PrS 
20 2s 3Q 

AAG GCC CAT GAA GTC CGT ATA AAG ATG GTG GCT ACA GGA ATT TGT CGC 144 
Lys Ala His Glu Val Arg He Lys Met Val Ala Thr Gly lie Cys £g 
lJ 35 40 45 



192 



TCA GAT GAC CAC GTA GTT AGT GGA ACC CTT GTC ACA CCT CTT CCT GTG 
Ser Asp Asp His Val Val Ser Gly Thr Leu Val Thr Pro Leu Pro Val 
50 55 SO 

ATC GCA GGC CAT GAG GCA GCG GGC ATT GTG GAG AGC ATT GGA GAA GGC 240 
He Ala Gly His Glu Ala Ala Gly He Val Glu Ser He Gly Glu Gly 



283 



384 



624 
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GGC GGT TTT AAG AGT AAA GAT TCT GTC CCC AAA CTT GTG GCC GAT TTT 
Gly Gly Phe Lys Ser Lys Asp Ser Val Pro Lys Leu Val Ala Asp Phe 
320 325 330 335 

ATG GCT AAA AAG TTT GCA CTG GAT CCT TTA ATC ACC CAT GTT TTA CCT 
Met Ala Lys Lys Phe Ala Leu Asp Pro Leu He Thr His Val Leu Pro 
340 345 350 

TTT GAA AAA ATA AAT GAA GGA TTT GAC CTG CTT CGC TCT GGA GAG AGT 
Phe Glu Lys He Asxi Glu Gly Phe Asp Leu Leu Arg Ser Gly Glu Ser 
355 360 365 

ATC CGT ACC ATC CTG ACG TTT TGA 
He Arg Thr He Leu Thr Phe 
370 



(2) INFORMATION FOR SEQ ID NO: IB: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 374 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

Ser Thr Ala Gly Lys Val He Lys CyB Lys Ala Ala Val Leu Trp Glu 
1 5 10 15 

Glu Lys Lys Pro Phe Ser He Glu Glu Val Glu Val Ala Pro Pro Lys 
20 25 30 

Ala His Glu Val Arg He Lys Met Val Ala Thr Gly lie Cys Arg Ser 
35 40 45 

Asp Asp His Val Val Ser Gly Thr Leu Val Thr Pro Leu Pro Val lie 
50 55 60 

Ala Gly His Glu Ala Ala Gly He Val Glu Ser lie Gly Glu Gly Val 
65 70 75 80 

Thr Thr Val Arg Pro Gly Asp Lys Val He Pro Leu Phe Thr Pro Gin 
85 90 95 

Cys Gly Lys Cys Arg Val Cys Lys Hie Pro Glu Gly Asn Leu Cys Leu 
100 105 110 

Lys Asn Asp Leu Ser Met Pro Arg Gly Thr Met Gin Asp Gly Thr Ser 
115 120 125 

Arg Phe Thr Cys Arg Gly Lys Pro He His His Phe Leu Gly Thr Ser 
130 135 140 

Thr Phe Ser Gin Tyr Thr Val Val Asp Glu He Ser Val Ala Lys He 
145 150 155 160 

Asp Ala Ala Ser Pro Leu Glu Lys Val Cys Leu He Gly Cys Gly Phe 
165 170 175 

Ser Thr Gly Tyr Gly Ser Ala Val Ly3 Val Ala Lys Val Thr Gin Gly 
180 185 190 

Ser Thr Cys Ala Val Phe Gly Leu Gly Gly Val Gly Leu Ser Val He 
195 200 205 

Met Gly Cys Lys Ala Ala Gly Ala Ala Arg He He Gly Val Asp He 
210 215 220 

Asn Lys Asp Lys Phe Ala Lys Ala Lys Glu Val Gly Ala Thr Glu Cys 
225 230 235 240 

Val Asn Pro Gin Asp Tyr Lys Lys Pro He Gin Glu Val Leu Thr Glu 
245 250 255 

Met Ser Asn Gly Gly Val Asp Phe Ser Phe Glu Val He Gly Arg Leu 
260 265 270 

Asp Thr Met Val Thr Ala Leu Ser Cys Cys Gin Glu Ala Tyr Gly Val 
275 280 285 

Ser Val He Val Gly val Pro Pro Asp Ser Gin Asn Leu Ser Met Asn 
290 . 295 300 
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10 



15 



20 



25 



30 



45 



70 
75 



Pro Met 
305 


Leu 


Leu 


Leu Ser 
310 


Gly Phe 


Lys 


Ser 


Lys Asp 
325 


Ala Lys 


Lya 


Phe 
340 


Ala Leu 


Glu Lys 


He 
35S 


Asn 


Glu Gly 


Arg Thr 
370 


He 


Leu 


Thr Phe 



54 



Lys Gly Ala He Phe Glv 

Leu val Ala Asp Phe Met 
335 



*jcu lie iar his Val Leu Pro Phe 
345 icn 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 112a base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:19; 



ATG AGC ACA GCA GGA AAA GTA ATA AAA TGC AAA GCG GCT GTG CTG TGG 
Ser Thr Ala Gly Lys Val He Lys Cys Lys Ala All 52 22 S 
5 1° IS 

35 GAG GAA AAG AAA CCA TTT TCC ATC GAG GAG GTG GAG GTT GCA CCC CCG 
Glu Glu Lys Lys Pro Phe Ser He Glu Glu Val Glu 52 Ala Pro Pro 

25 30 

AAG GCC CAT GAA GTC CGT ATA AAG ATG GTG NNN ACA GGA ATT TGT rrr 
40 Lys Ala His Glu Val Arg lie Lys Met Val ™ 5£ Gly lie £s £g 

TCA GAT GAC CAC NNN GTT AGT GGA ACC CTT GTC ACA CCT CTT CCT GTG 
Ser Asp Asp His Val Val Ser Gly Thr Leu Val ?hr p£ 2£ 52 

55 60 

aS G^v £E G?u £° 2?° *™ GTG GAG ATT GGA GAA GGC 

He Ala Gly His Glu Ala Ala Gly He Val Glu Xaa He Gly Glu Gly 
70 75 



140 



AGC ACC TTC TCC CAG TAC ACC GTG GTG GAC GAG ATC TCA GTG GCC AAG 
Ser Thr Phe Ser Gin Tyr Thr Val Val Asp Glu lie £?■ 52 Ma £s 



15S 



ATC GAT GCG GCC TCA CCG CTG GAG AAA GTC TGT CTC ATT GGC TGT rra 
lie Asp Ala Ala Ser Pro Leu Glu Lys Val Cyj ™ Gly ™ g£ 



175 



205 



48 



96 



144 



192 



240 



50 

GTC ACT ACA GTA AGA CCA GGT GAT AAA GTC ATC CCA rrr ttt -hum ,™ 
Val Thr Thr Val Arg Pro Gly Asp ^ 52 tie Pro" E pII K £o 
8 5 90 95 

55 CAG TGT GGA AAA TGC AGG GTT TGT AAG CAC CCT GAA GGC AAC NNN TGC ii« 
Gin Cys Gly Lys Cys Arg Val Cys Lys His Pro Glu ^ ™ 

100 105 110 yS 

60 S?^^ GA I P" 0 * GC ATG CCT ^ GGA ACC ATG CAG GAT GGT ACC 3fl 4 
OU Leu Lys Asn Asp Leu Ser Met Pro Arg Gly Thr Met Gin Asp Gly Thr 384 

120 . 125 

AGC AGG TTC ACC TGC AGA GGG AAG CCC ATC CAC CAC TTr rrr nnn ,™ 

65 ser Ar * »o Thr cya Aeg Gly Si *~ 25 «2 2S ?S 432 



480 



528 



TTT NNN ACT GGT TAT GGG TCT GCA GTC AAG GTT GCC AAG GTC ACC CAG 
Phe Xaa Thr Gly T yr Gly Ser Ala Val Lys Val Ala Lys^ 52 rlr Gin 
180 IBS 190 

80 S5SSgSSSSS^S5S2«2 ES ». - 

200 
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55 

210 215 220 

ATC AAC AAA GAC AAG TTT GCA AAG GCC AAA GAA GTG GGT GCC ACT GAG 720 
lie Asn Lys Asp Lys Phe Ala Lye Ala Lyg Glu Val Gly Ala Thr Glu 
225 230 235 

TGT GTC AAC CCT CAG GAC TAC AAG AAA CCC ATC CAG GAG GTG CTG ACA 768 
Cys Val Asn Pro Gin Asp Tyr Lys Lya Pro lie Gin Glu Val Leu Thr 
240 245 250 2S5 

GAA NNN AGC AAT GGA GGT GTG GAT TTT TCC TTT GAA NNN ATT GGT CGG 816 
Glu Xaa Ser Asn Gly Gly Val Asp Phe Ser Phe Glu Xaa He Gly Arg 
260 265 270 

15 CTC GAC ACT ATG GTG ACT GCC TTG TCA TGC NNN CAA GAA GCA TAT GGT 864 
Leu Asp Thr Met Val Thr Ala Leu Ser Cys Xaa Gin Glu Ala Tyr Gly 
275 280 2B5 

GTG AGC GTC ATT NNN GGA GTA CCT CCT NNN TCC CAA AAT CTC TCT ATG 912 
20 Val Ser Val He Xaa Gly Val Pro Pro Xaa Ser Gin Asn Leu Ser Met 
290 295 300 

AAT CCT ATG TTG CTA CTG AGT GGA CCT ACC TGG AAA GGA GCT ATT TTT 960 
Asn Pro Met Leu Leu Leu Ser Gly Arg Thr Trp Lys Gly Ala He Phe 
25 305 310 315 



30 



GGC GGT TTT AAG AGT AAA GAT TCT GTC CCC AAA CTT GTG GCC GAT TTT 1008 
Gly Gly Phe Lys Ser Lys Asp Ser Val Pro Lys Leu Val Ala Asp Phe 
320 325 330 335 

ATG GCT AAA AAG TTT GCA CTG GAT CCT TTA ATC ACC CAT GTT TTA CCT 1056 
Met Ala Lys Lys Phe Ala Leu Asp Pro Leu He Thr His Val Leu Pro 
340 345 350 

35 TTT GAA AAA ATA AAT GAA GGA TTT GAC CTG CTT CGC TCT GGA GAG AGT 1104 
Phe Glu Lys He Asn Glu Gly Phe Asp Leu Leu Arg Ser Gly Glu Ser 
355 360 365 

ATC CGT ACC ATC CTG ACG TTT TGA 1128 
40 He Arg Thr He Leu Thr Phe 
370 



45 



50 



60 



75 



(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 374 amino acids 

(B) TYPE: amino acid 
<D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 20: 



55 Ser Thr Ala Gly Lys Val He Lys Cys Lys Ala Ala Val Leu Trp Glu 
15 10 15 



Glu Lys Lys Pro Phe Ser lie Glu Glu Val Glu Val Ala Pro Pro Lys 
20 25 30 

Ala His Glu Val Arg He Lys Met Val Xaa Thr Gly He Cys Arg Ser 
35 40 4S 



Asp Asp His Xaa Val Ser Gly Thr Leu Val Thr Pro Leu Pro Val He 

65 50 55 60 

Ala Gly His Glu Ala Ala Gly He Val Glu Xaa He Gly Glu Gly Val 

65 70 75 80 

70 Thr Thr Val Arg Pro Gly Asp Lys Val He Pro Leu Phe Xaa Pro Gin 

85 90 95 



Cys Gly Lys Cys Arg Val Cys Lys His Pro Glu Gly Asn Xaa Cys Leu 
100 105 no 

Lys Asn Asp Leu Ser Met Pro Arg Gly Thr Met Gin Asp Gly Thr Ser 
115 120 125 



Arg Phe Thr Cys Arg Gly Lys Pro He His His Phe Leu Gly Thr Ser 
80 130 135 140 

Thr Phe Ser Gin Tyr Thr Val Val Asp Glu He Ser Val Ala Lys He 
145 150 155 160 
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Asp Ala Ala Ser Pro Leu Glu Lys Val Cys Leu lie Gly Cys Gly Phe 
165 170 17 5 

Xaa Thr Gly Tyr Gly Ser Ala Val Lys Val Ala Lys Val Thr Gin Gly 
180 iss 190 

Ser Thr Cys Ala Val Phe Gly Leu Gly Gly Val Gly Leu Ser Val He 
195 200 205 

Met Gly Cys Lys Ala Ala Gly Ala Ala Arg He He Gly Val Asp He 
210 215 220 

Asn Lys Asp Lys Phe Ala Lys Ala Lys Glu Val Gly Ala Thr Glu Cys 
22S 230 235 240 

Val Asn Pro Gin Asp Tyr Lys Lys Pro lie Gin Glu Val Leu Thr Glu 
24 5 250 255 

Xaa Ser Asn Gly Gly Val Asp Phe Ser Phe Glu Xaa He Gly Arg Leu 
ZU 260 265 270 

Asp Thr Met Val Thr Ala Leu Ser Cys Xaa Gin Glu Ala Tyr Gly Val 
275 280 285 

25 Ser Val He Xaa Gly Val Pro Pro Xaa Ser Gin Asn Leu Ser Met Asn 
290 295 200 

Pro Met Leu Leu Leu Ser Gly Arg Thr Trp Lys Gly Ala He Phe Gly 
305 310 315 320 

Gly Phe Lys Ser Lys Asp Ser Val Pro Lys Leu Val Ala Asp Phe Met 
32S 330 335 

Ala Lya Lys Phe Ala Leu Asp Pro Leu He Thr His Val Leu Pro Phe 
340 345 350 

Glu Lys lie Asn Glu Gly Phe Asp Leu Leu Arg Ser Gly Glu Ser lie 
355 360 365 

40 Arg Thr He Leu Thr Phe 
370 



30 



45 



(2) INFORMATION FOR SEQ ID NO: 21: 



(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 

30 (D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: other nucleic acid 

55 (Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

CCCCGAATTC TCAAAACGTC AGGATGGTAC G 31 
(2) INFORMATION FOR SEQ ID NO: 22: 

60 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 44 base pairs 

(B) TYPE: nucleic acid 
£e (C) STRANDEDNESS: single 
u5 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: 

CCCCTCTAGA ATAAATGAGC ACAGCAGGAA AAGTAATAAA ATGC 44 



70 
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WHAT IS CLAIMED IS : 

1. A method of obtaining a nonnative protein 
having a thermostability that is increased over that of 
the native version of said protein, wherein said method 

5 comprises : 

(a) obtaining in a vector a gene that encodes said 

native protein; 

(b) mutating said vector at more than one position 
in said gene to produce a vector library of cells 

10 comprising mutated versions of said gene; 

(c) introducing said vector library en masse into 
cells of a strain in which the majority of said mutated 
versions of said gene are transcribed and translated to 
produce a cell library; 

15 (d) screening said cell library to identify a cell 

comprising a mutated version of said gene that encodes a 
nonnative protein having a thermostability that is 
increased over that of the wild-type version of said 
protein; and 

20 (e) purifying said cell from said cell library. 

2. The method of claim 1 which further comprises 
isolating from said cell in a vector said mutated version 
of said gene and, on said mutated version of said gene, 

25 repeating steps (b) through (e) . 

3. The method of claim 1 wherein said protein is 
an alcohol dehydrogenase. 

4 . The method of claim 1 wherein said protein is 
horse liver alcohol dehydrogenase. 

30 5. The method of claim 1, wherein said screen is 

carried out in the presence of alcohol . 

6. The method of claim 1, wherein said screen is 
carried out at an increased temperature. 

7. The method of claim 1 # wherein said strain is 
35 either Escherichi coli or Thermus flavus. 

8. A method for selecting against growth of 
Escherichi coli recombinant cells which comprise levels 
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of alcohol dehydrogenase that are higher than those of 
wild-type Escherichia coli cells, wherein said method 
comprises growing said recombinant cells under conditions 
selected from the group consisting of wherein ethanol is 
5 present in a concentration of about 10%, isopropanol is 
present in a concentration of about 4%, and propanol is 
present in a concentration of about 2%, with the proviso 
that said wild- type cells exhibit reduced or an absence 
of growth under said conditions. 

10 9. A method for selecting for growth of Thermus 

flaws recombinant cells which comprise levels of alcohol 
dehydrogenase that are higher than those of wild- type 
Thermus flavus cells, wherein said method comprises 
growing said recombinant cells under conditions selected 

15 from the group consisting of wherein ethanol is present 
in a concentration of about 1% in a liquid or solid 
medium at a pH of about 7.0, and isopropanol is present 
in a concentration of from about 0.5% to about 1% in a 
liquid or solid medium at a pH of about 7.0, with the 

20 proviso that said wild-type cells exhibit reduced or an 
absence of growth under said conditions. 

10. A method of increasing the thermostability of 
horse liver alcohol dehydrogenase, which comprises 
introducing into a gene which encodes said alcohol 

25 dehydrogenase a mutation at a codon which codes for an 
amino acid residue at a position selected from the group 
consisting of amino acid positions 75, 94, 110, 177, 257, 
268, 282, 292, and 297. 

11. A method of increasing the thermostability of 
30 horse liver alcohol dehydrogenase, which comprises 

changing an amino acid residue at a position selected 
from the group consisting of amino acid positions 75, 94, 
110, 177, 257, 268, 282, 292, and 297. 

12. An isolated and purified nucleic acid 

35 comprising a sequence selected from the group consisting 
Of SEQ ID NO:3, SEQ ID NO: 5, SEQ ID NO: 7, SEQ ID NO: 9, 
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SEQ ID NO: 11, SEQ ID NO:13, SEQ ID NO:15, SEQ ID NO:17, 
and SEQ ID NO: 19. 

13 . An isolated and purified protein comprising a 
sequence selected from the group consisting of SEQ ID 

5 NO:4 f SEQ ID NO: 6, SEQ ID NO: 8, SEQ ID NO: 10, SEQ ID 

NO: 12, SEQ ID NO: 14, SEQ ID NO: 16, SEQ ID NO: 18, and SEQ 
ID NO:20. 

14 . A plasmid comprising the nucleic acid sequence 
of claim 12 . 

10 15. A plasmid selected from the group consisting of 

pAD7, pAD8, pADIO, pAD91, pAD92, pAD93 , pAD95, pADlll, 
pAD113, and pTG450 . 

16. A vector library comprising an isolated and 
purified mixture of vectors comprising mutated versions 

15 of a horse liver alcohol dehydrogenase gene. 

17. A host cell comprising a plasmid according to 
claim 14 . 

18. A host cell comprising a plasmid according to 
claim 15 . 

20 19. A host cell according to claim 17, wherein said 

cell is a member of the genus of Thermus or Escherichia. 

20. A host cell according to claim 18, wherein said 
cell is strain TGF650. 

21. A cell library comprising an isolated and 

25 purified mixture of cells obtained by transformation en 
masse with the vector library of claim 16. 
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Fig. 4 
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